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Read This First 











This preface summarizes the chapters, lists related documentation, and 
describes the style and symbol conventions used in this manual. 


How This Manual Is Organized 


Chapter 1 
Chapter 2 


Chapter 3 


Chapter 4 


Chapter 5 


Chapter 6 


Chapter 7 


Chapter 8 


Chapter 9 


This document contains the following chapters: 


Introduction 
An introduction to SuperSPARC based systems 


Summary of the SPARC Architecture 
Summarizes the major features of the SPARC Version 8 architecture. 


SuperSPARC Introduction 
Contains an introduction to the SuperSPARC processor, a highly integrated 
high-performance implementation of the SPARC architecture. 


Register Summary 

Summarizes the registers in the SuperSPARC processor's integer and floating 
point units. 

Principles of Operation 

Describes the SuperSPARC processor's pipeline operation and how to use it 
efficiently. 

Code-Generation Principles 

Presents guidelines for increasing the performance of programs on the 
SuperSPARC processor. 


instructions 
Outlines the instructions that expand upon the SPARC Version 8 instruction 
set. 


Memory Model 
Describes the memory in a SuperSPARC-based system. 


MMU Operation 
Describes the SuperSPARC processor’s MMU, which is compatible with the 
SPARC Reference MMU Specification. 
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Chapter 10 


Chapter 11 


Chapter 12 


Chapter 13 


Chapter 14 


Chapter 15 


Chapter 16 


Chapter 17 


Chapter 18 


Chapter 19 


Chapter 20 


Chapter 21 


Chapter 22 


How This Manual is Organized 





Caches/Store Buffer 
Describes SuperSPARC's internal caches and store buffer. 


Floating-Point Unit Operation 

Describes the floating-point operations, concentrating on the 
floating-point-queue interface, special numeric cases, and floating-point 
exceptions. 


Traps 
Gives an overall view of the types of traps that may occur during normal or 
exceptional operation of SuperSPARC. 


Reset 
Describes how SuperSPARC implements various forms of reset. 


Startup Procedure 
Gives an example of code from a reset handler. The example shows critical 
initializations required for proper operation of the SuperSPARC processor. 


Diagnostic Operation 
Details software debugging capabilities and external monitors. 


MultiCache Controller (MXCC) 
Describes the MultiCache Controller, an optional external cache controller for 
SuperSPARC processors. 


MBus 

Details the operation of the SPARC MBus on the SSP and the MXCC. MBus is 
a high-speed interface that connects SPARC processor modules to physical 
memory and ИО modules. 


VBus 


Details the operation of the VBus interface used between SuperSPARC and 
the MXCC. 


XBus 
Details the operation of the XBus packet bus interface used for multiprocessor 
SuperSPARC systems. 


BootBus 

Describes the BootBus, a simple synchronous 12-pin interface provided by the 
MXCC for accessing an EPROM for bootstrap loading and for accessing other 
low-speed peripherals. 

JTAG Serial Scan Interface 

Explains how the IEEE 1149.1 JTAG serial scan interface mechanism 
supports observation and control of the SuperSPARC processor for different 
applications. 

Scan-Based Debug 

Explains the use of the IEEE 1149.1 JTAG serial scan interface for software 
debugging. 
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Chapter 23 Clocking 
Describes SuperSPARC and MXCC essential clock requirements. 


Chapter 24 MBus Module 
Desctibes an example application of a plug-in MBus module. 


AppendixA Instruction Summary 
Contains a summary of SuperSPARO's instruction set. 


Appendix B ASI/Diagnostic Access 
Provides a table of the SuperSPARC ASI assignments. 


Appendix C SuperSPARC Processor Pin Description Tables 
Contains tables describing each pin on the SSP. 


Appendix D  MultiCache Controller Pin Description Tables 
Contains tables describing each pin on the MXCC. 


Appendix Е SuperSPARC Revision Summary 
Contains table describing salient features of SuperSPARC revisions. 


Appendix F  MultiCache Controller Revision Summary 
Contains table describing salient features of MultiCache Controller revisions. 


Appendix G Glossary 
Contains a glossary of important terms and acronyms used in this book. 


Related Documentation 
The following related documents are also available. 
С SuperSPARC (STP1020N, STP1020, STP1020A) Data Sheet 
С MultiCache Controller (STP1090, STP1090A) Data Sheet 
(а The SPARC Architecture Manual, Version 8 
(3 SPARC MBus Interface Specification 
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Style and Symbol Conventions 
This document uses the following conventions: 


С Program listings, program examples, and interactive displays are shown 
in a special font. Examples use a bold version of the special font for 
emphasis. Here is a sample program listing: 

add $10, %11, %12 
sub %01, %02, %03 
and $03, $04, %05 
ld [$12 + 0x10], $13 


iv Read This First 
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Siyle and Symbol Conventions 


Hexadecimal Numbers will be written in the Oxnnn form. Thus 31 decimal 
written as in hexadecimal will be written Ox1F. 


Register fields will be shown in the form REGISTER.FIELD. For example, 
the Current Window Pointer field of the Processor Status Register will be 
written as PSR.CWP . 


Register fields named in ALL CAPITALS are both readable and writable 
by software. While register fields named in small letters are readable by 
software but are set by hardware (writing to these fields will not necessarily 
change their value). For example, TBR.TBA is set by the system software 
while TBR.tt is set by hardware on receipt of a trap. 


Register Conventions. The following are the conventions used in naming 
specific register fields. 


reserved, res, г reserved for definition in future versions of the SPARC 
architecture. A reserved fiekd should always be written to as zeros by 
software, but software should not assume a reserved field will read as 
zeros. 


unused, u used to describe a register field that is not currently defined by 
the architecture. An unused field should always be written to as zeros by 
software, but when read an unused field will return an undefined value. 


zero, 0 used to describe a register field that will always read as zero. A 
write of any other value to a zero field will have no effect on a subsequent 
read, which will retum zero. 
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Following are other symbols and abbreviations used throughout this 


document. 
Е — — belio — Г Syri 
TALU _ | Arithmetic Logic unt _________ Arithmetic Logic Unt — | [me 


C м | 
уза __| пауз | | 


LSB Least Significatn Вује (or Bit) 
MXCC MultiCache Controller 
MMU Memory Management Unit 


NOES! | Motta, Owned, Excuse, Sadat | T50 | 
мэз | могаџткатблцива {л | 
мах [юшту м | 
NOP оона |м 


BST _|ељашња м | 
cm — [Ormerermende ________| ти _ 
Cup | ботап ндон Pomer ________| POST | 
EPROM | Бава Ратио Reno Memory | PPN 
ero | ће ________|| мо | 
FU __| Раоа — — — — [58 — 
PG СЕССИИ ИССА 
о [елси — |" _ 
ш [этет E 
TAG | Jorn Tox hones E 

сева зоону wre E 

ССТТ те | 

Memory varag ut |ne 


vi 


Definition 
next Program Counter 
Physical Address 
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Processor Interrupt Level 
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Page Table Entry 
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Reduced Instruction Set Computer 
Sequential Ordering 
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Translation Lookaside Buffer 
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Virtual Address 
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Chapter 1 


| Introduction 








The Sparc Technology Business SuperSPARC chipset comprises the Super- 
SPARC processor (SSP) and the optional MultiCache Controller (MXCC). This 
chapter describes the key components, features, and configurations of the 
chipset. 


Topic Page 






1-1 
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1.1 


Introduction to SuperSPARC Chipset | 


Introduction to SuperSPARC Chipset 


The SuperSPARC chipset, which comprises the SuperSPARC advanced 
microprocessor and the MultiCache Controller, combines STB's strengths in 
high-performance semiconductor processes and Sun Microsystems' exper- 
tise in sophisticated computer systems. 


111 Key Components 


SuperSPARC Microprocessor 


The processor's 3.1 million transistors form a highly integrated processing 
subsystem containing: 


O DOO OCUDOHDLD oU 


Q 


Superscalar integer unit. 

Double-precision IEEE floating-point unit. 

20K-byte instruction cache. 

16K-byte data cache. 

SPARC Reference memory management unit (MMU). 
Eight-entry store buffer. 

Dual-mode bus interface. 

IEEE 1149.1 JTAG test and debug. 


SuperSPARC MultiCache Controller 


1-2 


The MXCC is an optional externa! cache controller for the SuperSPARC pro- 
cessor (SSP). It is employed when a large secondary cache or an interface to 
а non-MBus system is a requirement. The MXCC contains: 


Q 
О 


Cache tags and control for .5 to 2M bytes of external cache memory. 


Synchronization logic to allow the processor to be clocked faster than the 
system bus. 


Block copy/fill logic. 
Dual-bus interface: 


Ш MBus (CMOS). 
Ш XBus (GTL). 


introduction 
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1.1.2 Features 


The chipset's features aid in building a wide range of high-performance work- 
stations and server computers. 


С SPARC Compatibility 


The SSP is fully compatible with the SPARC Architecture, version 8, from 
SPARC Intemational. Compatibility ensures that thousands of SPARC ap- 
plication programs run on SuperSPARC-based systems. 


С Superscalar Execution 
The processor can execute up to three instructions per clock cycle. This 
internal parallelism accelerates all applications while allowing system 


clock rates to remain manageable. Even programs that have not been 
recompiled for superscalar execution run faster on SuperSPARC. 


( Built-in MBus Multiprocessing 


The chipset supports the SPARC standard MBus directly and, in several 
configurations, connects directly to MBus without any glue logic. Each 
chip contains all logic necessary for shared memory multiprocessing with 
fully coherent caches on MBus. 


(а High Integration 
The SuperSPARC chipset simplifies system design by integrating the 
components of a high-performance processing subsystem onto very few 
chips. 

Q Multiple Configurations 
The processor can be used alone or with MXCC and cache RAMs. The 


chipset can be used in uniprocessor or multiprocessor systems and sup- 
ports MBus systems and systems based on customer-defined buses. 
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1.2 System Configurations 


The chipset supports the configurations shown in Table 1-1. This section 
describes stand-alone and high-performance MBus and XBus configurations. 
The mínimum MBus and XBus configurations offer few advantages. 


Table 1—1. SuperSPARC Chipset Configurations 
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1.2.1 MBus Configurations 


MBus is a SPARC standard that connects high-performance processors to 
memory and maintains coherent memory and caches in shared memory multi- 
processors. Figure 1-1 shows a typical MBus system. 


Both the stand-alone and high-performance MBus configurations are avail- 
able as SPARC-standard MBus plug-in modules. The modules allow plug-in 
performance upgrades. 


1-4 Introduction 
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Figure 1—1. Typical MBus System 
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in stand-alone configurations, the processor connects directly to the MBus, as 
shown in Figure 1-2. Since this configuration does not use MXCC or cache 
RAM, it is cost-effective. For most applications, using more than two proces- 
sors will saturate the MBus bandwidth. MBus supplies the processor clock. 


Figure 1-2. Stand-Alone Configuration 
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System Configurations | 


The high-performance configuration is diagrammed in Figure 1-3. The exter- 
nai cache memory provides significant performance improvement and greatly 
decreases bus traffic in order to support more processors on a system bus. 
The SRAMs are industry-standard 128K-bit x 9 synchronous SRAMs. The 
MXCC allows the processor to be clocked faster than the system bus. 


Figure 1-3. High-Performance Configuration 
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1.2.2 XBus Configurations 


To support systems that do not use MBus, MXCC supports external system 
bus interfaces—or bus watchers (BWs)—in its XBus configurations. The BW 
interfaces MXCC to a particular system bus. Figure 1-4 shows the High-Per- 
formance XBus configuration. 


The synchronous packet-switched XBus was developed by Xerox PARC and 
extends the functions of MXCC to a BW. The bus uses GTL, a low-voltage 
electrical interface, for the high speed and noise immunity required in mas- 
sively parallel configurations. 


MXCC supports up to four extemal BWs. Each BW interfaces with a system 
bus so that up to four system buses can be used to increase available band- 
width between processors and memory. Multiple system buses can support a 
large number of SuperSPARC processors—up to 64 for some applications. 


Introduction 
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Figure 1-4. XBus System with External Bus Watchers 
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Chapter 2 


Е Summary of the SPARC Architecture 
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The SuperSPARC processor (SSP) is an implementation of The SPARC Ar- 
chitecture, version 8. This chapter summarizes the major features of the 


SPARC architecture; for a complete description, consult The SPARC Architec- 
ture Manual, version 8. 
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2.1 


Introduction 


The SSP implements the SPARC Architecture, version 8, and a SPARC Refer- 
ence Memory Management Unit (MMU) as specified in The SPARC Architec- 
ture Manual. SPARC is a standard, and, while SuperSPARC meets the 
SPARC standard, it does not implement all of the optional features (e.g., qua- 
druple-precision floating-point). 


The material in this chapter is adapted from The SPARC Architecture Manual, 
copyright 1991 by SPARC International, Inc., based on technology developed 
by Sun Microsystems, Inc. Used by permission. 


These are the principal features of SPARC, version 8: 

A linear, 32-bit address space. 

A small number of simple instruction formats. 

Load/store architecture (no memory/operate instructions). 
Three register addresses (two operands and a separate destination). 
Large-windowed register file. 

Separate floating-point register file. 

Delayed control transfer. 

Fast trap handling. 

Multiprocessor synchronization instructions. 

Tagged data instructions. 

Optional coprocessor support (not supported on SuperSPARC). 


D O O O O O O O O OOUO 


Three multiprocessor memory models. 


SPARC has an instruction set architecture with 32-bit integer and 32-, 64-, and 
128-bit IEEE Standard 754 floating-point arithmetic as its principal data types. 
It defines general-purpose integer, floating-point, special state/status regis- 
ters and 69 basic instruction operations. These instructions are all encoded in 
32-bit-wide instruction formats. The ioad/store instructions address a linear, 
23?-byte address space. 


A SPARC processor is logically composed of the following: 

O An integer unit (IU), 

Cj A floating-point unit (FPU), and 

( An optional coprocessor (CP), each with its own registers. 


Summary of the SPARC Architecture 
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2.1.1 


Integer Unit 


IU and FPU registers are 32 bits wide. Instruction operands are generally 
single registers or register pairs. The processor can be in either of two modes: 
user or supervisor. In supervisor mode, the processor can execute any instruc- 
tion, including the privileged (supervisor-only) instructions. In user mode, an 
attempt to execute a privileged instruction will cause a trap to supervisor soft- 
ware. User-application programs execute while the processor is in user mode. 


The IU contains the general-purpose registers and controls the overall opera- 
tion of the processor. The IU executes the integer arithmetic instructions and 
computes memory addresses for loads and stores. It also maintains the pro- 
gram counters and controls instruction execution for the FPU and the CP. 


An implementation can contain from 40 to 520 general-purpose 32-bit r regis- 
ters. This corresponds to a grouping of the registers into eight global r regis- 
ters, plus a circular stack of from 2 to 32 sets of 16 registers each, known as 
register windows (see Section 4.1). Since the number of register windows 
present (NWINDOWS) is implementation-dependent, the total number of reg- 
isters is also implementation-dependent. 


SuperSPARC has eight register windows, for a total of 136 r registers. 


An instruction can access the eight global registers and the current window of 
24 rregisters. A window of 24 registers is composed of a 16-register set—di- 
vided into eight in and eight local registers—together with the eight in registers 
of an adjacent register set, addressable from the current window as out regis- 
ters.The current window is specified by the current window pointer (CWP) field 
in the processor state register (PSR). Window overflow and underflow are de- 
tected via the window invalid mask (WIM) register, which is controlled by su- 
pervisor software. The actual number of windows in a SPARC implementation 
is invisible to a user-application program. 


When the IU accesses memory, the IU appends an address space identifier 
(ASI) to the address. The ASI encodes the address according to whether the 
processor is in supervisor or user mode and whether the access is to instruc- 
tion memory orto data memory. Supervisor programs can make access to pro- 
gram-controlled address spaces by using the Load/Store AS! instructions. 


2.1.2 Floating-Point Unit (FPU) 


The FPU has 32 32-bit floating-point f registers. Double-precision values 
occupy an even-odd pair of registers. Thus, the floating-point registers can 
hold a maximum of 32 single-precision, 16 double-precision, or 8 quad-preci- 
sion values. 
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Fioating-point load/store instructions are used to move data between the FPU 
and memory. The memory address is calculated by the IU. Floating-point oper- 
ate (FPop) instructions perform the actual floating-point arithmetic. 


Thefloating-point data formats and instruction sets conform to the IEEE Stan- 
dard for binary floating-point arithmetic, АМЗМЕЕЕ 754-1985. However, 
SPARC does not require that all aspects of the standard, such as gradual un- 
derflow, be implemented in hardware. An altemate method of indicating that 
a floating-point instruction failed to produce a correct ANSI/IEEE Standard 
754-1985 resultis to generate a special floating-point unfinished or unimplem- 
ented exception. Software must emulate any functionality not present in the 
hardware. If an FPU is not present, or if the enable floating-point (PSR.EF) bit 
in the PSR is 0, an attempt to execute a floating-point instruction will generate 
an illegal instruction trap. In either of these cases, software must emulate the 
trapped fioating-point instruction. 


SuperSPARC never generates a floating-point unfinished trap since it handles 
gradual underflow in hardware. SuperSPARC implements all 32-bit single- 
precision floating-point and 64-bit double-precision floating-point instructions. 
Itimplements по 128-bit quad-precision floating-point instructions. Quad-pre- 
cision instructions trap with a floating-point unimplemented exception. 


2.1.3 Coprocessor (CP) 


2-4 


While the SPARC architecture supports an optional coprocessor, Super- 
SPARC contains no provisions for a coprocessor. 


Summary of the SPARC Architecture 
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2.2 Instructions 


2.2.4 Load/Store 


instructions 


Instructions fall into six basic categories: 
(Д Load/store. 

Arithmetic/logical/shift. 

Control transfer. 

Read/write. 

Floating-point operate. 


з пат ни. 


Coprocessor operate. 


Load/store instructions are the only instructions that access memory. They 
use two 7 registers or an r register and a signed 13-bit immediate value to cal- 
culate a 32-bit, byte-aligned memory address. The destination field of the load/ 
store instruction specifies an r register, f£ register, or coprocessor register that 
supplies the data for a store or receives the data from a load. 


The processor appends an ASI to every access. This ASI is derived in one of 
two ways: directly from a Load/Store ASI instruction, or from a default ASI that 
is generated based on user data access or supervisor data access. This AS! 
is used by the MMU and/or the system. 


Integer load and store instructions support byte, half-word (16-bit), word 
(32-bit), and double-word (64-bit) accesses. Versions of integer load instruc- 
tions perform sign-extension on 8- and 16-bit values as the values are loaded 
into the destination register. Floating-point and coprocessor load and store in- 
structions support word and double-word memory accesses. 


Alignment Hestrictions 


Half-word accesses must be aligned on two-byte boundaries, word accesses 
must be aligned on four-byte boundaries, and double-word accesses must be 
aligned on eight-byte boundaries. An improperly aligned address in a load or 
store instruction causes a trap. 


Addressing Conventions 


SPARC is а “big-endian” architecture: the address of a double-word, word, or 
half-word is the address of its most significant byte. Increasing the address 
means decreasing the significance of the unit being accessed. Addressing 
conventions are illustrated in Figure 2-1. 
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Figure 2-1. Addressing Conventions 


Bytes 
address «1:0» 0 1 2 3 
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Halfwords 
address «1:0» 0 ‚ 2 


Doubleword 


Load/Store Alternate 


Special, privileged versions ofthe load and store integer instructions (the load/ 
store alternate instructions) can directly specify an arbitrary eight-bit address 
space identifier for the load/store data access. The privileged load/store alter- 
nate instructions can be used by supervisor software to access special pro- 
tected registers, such as an MMU, cache-control, processor control registers, 
and other processor- or system-dependent values. 


Summary of the SPARC Architecture 
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Separate I&D Memories 


Most specifications in The SPARC Architecture Manual are written as if the 
store instructions wrote to the same memory from which instructions were ac- 
cessed. However, an implementation may explicitly partition instructions and 
data into independent instruction and data memories (caches), commonly re- 
ferred to as a "Harvard" architecture or "split | & D caches." If a program in- 
cludes self-moditying code, it must issue FLUSH instructions (or supervisor 
calls that have an equivalent effect) for the addresses to which new instruc- 
tions are written. A FLUSH instruction ensures that the data previously written 
by a store instruction is seen by subsequent instruction fetches from the given 
address. 


2.22 Arithmetic/Logical/Shift 


The arithmetic/logical/shift instructions perform arithmetic, tagged-arithmetic, 
logical, and shift operations. With one exception, these instructions compute 
агези that is a function of the two source operands; the result is either written 
into a destination register or discarded. The exception is a specialized instruc- 
tion, SETHI, which (along with a second instruction) can be used to create a 
32-bit constant in an r register. Shift instructions can be used to shift the con- 
tents of an r register left or right by a given distance. The shift distance can be 
specified by a constant in the instruction or by the contents of an r register. 


The integer multiply instructions perform a signed or unsigned 32x 32 — 
64-bit multiplication. The integer division instructions perform a signed or un- 
signed 64 + 32 — 32-bit division. Versions of multiply and divide set the condi- 
tion codes. Division by zero causes a trap. 


The tagged-arithmetic instructions assume that the least significant two bits 
of the operands are data-type "tags." When there is an arithmetic overflow or 
if any of the operands' tag bits are nonzero, these instructions set the overflow 
condition code bit. Some versions of tagged arithmetic instructions trap when 
either of these conditions occurs. 


2.2.3 Control Transfer 


Control-transter instructions (СТЕ) include program counter (PC)-relative 
branches and calls, register-indirect jumps, and conditional traps. Most of the 
control-transfer instructions are delayed control-transfer instructions (ОСТ), 
in which the instruction immediately following the ОСТІ is executed before the 
control transfer to the target address is completed. 
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2.2.4 Read/Write 


Theinstruction following a delayed control-transfer instructionis called a delay 
instruction. The delay instruction is always fetched, even if the delayed control 
transfer is an unconditional branch. A bit in the delayed control-transfer in- 
struction, however, can cause the delay instruction to be annulled (that is, to 
have no effect) if the branch is not taken (or, in the branch always case, if the 
branch is taken). 


Branch and CALL instructions use PC-relative displacements. The jump and 
link (JMPL) instruction uses a register-indirect target address. It computes its 
target address as either the sum of two r registers or the sum of an r register 
and a 13-bit signed immediate value. Thé branch instruction provides a dis- 
placement of +8M bytes, while the CALL instruction's 30-bit word displace- 
ment allows a control transfer to an arbitrary 32-bit instruction address. 


Ticc instructions cause a non-delayed transfer to a trap table entry (see Sec- 
tion 2.5 and Chapter 12). 


The read/write register instructions read and write the contents of software- 

visible state/status registers. Software can use read/write "ancillary state reg- 

ister” instructions to read/write unique implementation-dependent processor 

registers. Whether each of these instructions is privileged is implementa- - 
tion-dependent. (See Section 7.3, Write PSR.) 


2.2.5 Floating-Point Operate 


FPop instructions perform all floating-point calculations. They are regis- 
ter-to-register instructions that operate on the floating-point f registers. Like 
arithmetic/logical/shift instructions, an FPop computes а result that is a func- 
tion of one or two source operands. Specific floating-point operations are se- 
lected by a subfield of the FPop1/FPop2 instruction formats. See Chapter 11. 


2.2.5 Coprocessor Operate 
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Coprocessor operate (CPop) instructions are defined by the implemented co- 
processor, if any. These instructions are specified by the CPop1 and CPop2 
instruction formats. 
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2.3 Memory Model 
The SPARC memory model has two functions: 


O Itdefines the semantics of such memory operations as load and store, and 


С] Itspecifies the relationship between the order in which these operations 
are issued by a processor and the order in which they are executed by 
memory. 


The model applies both to uniprocessors and shared-memory multiproces- 
sors (see Chapter 8). 


The standard memory model is called total store ordering (TSO). All SPARC 
implementations must provide at least TSO. An additional model called partial 
store ordering (PSO) is defined to allow higher-performance memory systems 
to be built. If present, this model is enabled vía a mode bit—for example, in an 
MMU contro! register. Machines that implement strong consistency (also 
called strong ordering) automatically support both TSO and PSO because the 
requirements of strong consistency are more stringent. In strong consistency, 
the loads, stores, and atomic load-stores of all processors are executed by 
memory serially in an order that conforms to the order in which these instruc- 
tions were issued by individual processors. However, a machine that imple- 
ments strong consistency may deliver lower performance than an equivalent 
machine that implements TSO or PSO. 


The general guidelines for programs are as follows: programs written for PSO 
will work automatically on a machine running in TSO mode or on a machine 
that implements strong consistency; programs written for TSO will work auto- 
matically on a machine that implements strong consistency; programs written 
for strong consistency may not work on a TSO or PSO machine; programs writ- 
ten for TSO may not work on a PSO machine. 


Multithreaded programs where all threads are restricted to run on a single pro- 
cessor will behave the same on PSO and TSO as they would on a Strongly 
Consistent machine. 
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2.4 Input/Output 
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The SPARC architecture assumes that input/output registers are accessed via 
load/store alternate instruction, normal load/store instructions, coprocessor 
instructions, or read/write ancillary state register instructions (RDASR, 
WRASR). In the case in which load/store altemate instructions are used, the 
VO registers can be accessed only by the supervisor. 


The contents and addresses of I/O registers are dependent on the system im- 
plementation. 


Summary of the SPARC Architecture 


Subject to Change Without Notice 


Traps 





2.5 Traps 


Trap Categories 


Atrap is a vectored transfer of control to the operating system through a spe- 
cial trap table that contains the first four instructions of each trap handler. The 
base address of the table is established by software in an IU state register (the 
trap base register, TBR). The displacement within the table is encoded in the 
type number of each trap. Half of the table is reserved for hardware traps, and 
the other half is reserved for software traps generated by trap (Ticc) instruc- 
tions. 


A trap causes the CWP to allocate a new register window and the hardware 
to write the program counters into two registers of the new window. The trap 
handler can access the saved PC and next program counter (nPC) and, in gen- 
eral, can freely use the six other local registers in the new window. 


An exception or interrupt request can cause either a precise trap, a deferred 
trap, or an interrupting trap. 


A precise trap is induced by a particular instruction and occurs before any әз 
gram-visible state is changed by the trap-inducing instruction. 


A deferred trap is also induced by a particular instruction, but, unlike a precise 
trap, it may occur after a program-visible state is changed by the execution of 
one or more instructions that follow the trap-inducing instruction. A deferred 
trap can occur one or more instructions after the trap-inducing instruction is 
executed. An implementation must provide sufficient supervisor-readable 
state (called a deferred-trap queue) to enable it to emulate an instruction that 
caused a deferred trap and to correctly resume execution of the process con- 
taining that instruction. 


An interrupting trap can be due to an external interrupt request not directly re- 
lated to any particular instruction, or it can be due to an exception caused by 
а particular previously executed instruction. An interrupting trap is neither a 
precise trap nor a deferred trap. An implementation need not necessarily pro- 
vide sufficient state to emulate an instruction that caused an interrupting trap. 


User-application programs do not "see" traps unless they install user trap han- 
dlers for those traps via calls to supervisor software. Also, the treatment of 
implementation-dependent machine-check exceptions can vary across Sys- 
tems. Therefore, SPARC lets an implementation define alternative trap mod- 
els for particular exception types. 
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The SPARC default trap model must be present in all implementations. It 
states that all traps must be precise, except for: 


О Floating-point or coprocessor traps, which may be deferred. 


Г] Machine-check or non-resumable-error exceptions, which may be def- 
erred or interrupting. 


С] Machine-check or non-resumable-error exceptions on the second access 
of atwo-memory-access load/store instruction, which may be interrupting. 


See Chapter 12 for information on how the SuperSPARC processor handles 
traps. 
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2.6 SPARC Reference MMU 


The SPARC reference MMU architecture is designed for use with SPARC pro- 
cessors and is optional; systems may not need an MMU or may need an MMU 
with different characteristics. The MMU architecture enables single-chip MMU 
implementations to perform general-purpose memory management that effi- 
ciently supports а large number of processes running a wide variety of applica- 
tions. The reference MMU uses three levels of page tables in main memory 
to store full translation information, and page table entries are cached in the 
MMU to provide quick translation. 


The MMU features: 

О 32-bit virtual address. 

36-bit physical address. 

Fixed 4K-byte page size. 

Support for sparse address spaces with three-level map. 

Support for large linear mappings (4K-, 256K-, 16M-, and 4G-byte). 
Support for multiple contexts. 


ооосоо 


Page-level protections. 
O Hardware miss processing. 


The reference MMU architecture specifies both the behavior of the MMU hard- 
ware and the organization and contents of the tables in the main memory re- 
quired to support it. 


The SuperSPARC processor contains an on-chip SPARC reference MMU. 
See Chapter 9. 
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The SuperSPARC processor (SSP) is a highly integrated high-performance 
implementation of the SPARC RISC architecture. It is a single-chip processor 
implemented in full custom BiCMOS technology. It is intended for use in a 
broad spectrum of system environments: from large-scale multiprocessor sys- 
tems to low-cost single-user workstations and high-performance embedded 
control applications. 


Topic 
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3.1 High Integration 


The SuperSPARC processor integrates most of the support functions normally 
required to build a SPARC-based system and has the following features: 


Q Integer Unit 

[] Memory Management Unit 

Floating-Point Unit 

Instruction Cache 

Data Cache 

Store Buffer 

External Cache Support 

Multi-Processor Cache Coherence Support 
(Д Hardware Breakpoints 

(О JTAG Testability Access 


пппгпп гп 


See Figure 3-1, which is a simplified block diagram of the SuperSPARC imple- 
mentation of the SPARC architecture. 
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Figure 3—1. Functional Block Diagram 



















Floating-Point Execution 
* Doubie-Pracision Adder Array 
* Double-Precision Multiplier Array 
• Integer Multiply & Divide 
• 32 entry FP Register File 


Floating-Point Unit Сотто! 
• FP Queue 

• FP Execution Control 
• FP Exception Control 













integer Unit Control 
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* Instruction Grouping and Decode! 
* Exception Handling 
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SuperScalar Integer Execution 
* Two independent, or Cascadable ALUs, One Shifter 
* Load/Store Address Generator 

* 64-Bit Loads and Stores 

e 8 Window (136 Registers) Integer Register File 
















Instruction Cache 
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3.1.1 Integer Unit 


The fully SPARC-compatible on-chip integer unit has a high-performance su- 
perscalar (multiple instructions per cycle) design. Up to two operator compute 
instructions and one memory access instruction can be executed in each 
cycle. The 136 integer registers are divided into eight register windows and . 
eight global registers. See Chapter 4 Chapter 5, and Chapter 10 contain de- 
tailed descriptions of the integer unit's operation. 
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3.1.2 Memory Management Unit 


3.1.3 


3.1.4 


3.1.5 


3.1.7 


SuperSPARC includes an implementation of the SPARC reference memory 
management unit (MMU). The MMU's 64-entry translation lookaside buffer 
(TLB) translates virtual to physical addresses. The second-level page table 
pointer (PTP) and the root pointer are cached to reduce the time to service TLB 
misses. Chapter 8 has more details. 


Floating-Point Unit 


Theon-chip SPARC floating-point unit (FPU) and controller provides high-per- 
formance single- and double-precision floating-point arithmetic functions and 
performs integer multiply and divide instructions. Chapter 11 provides a de- 
tailed view of floating-point operation. 


instruction Cache 


Data Cache 


Store Buffer 


The large five-way set-associative instruction cache increases performance 
and reduces the demands on an external memory system. The cache has 
20K-bytes oftotal storage capacity. The cache is a physically addressed cache 
and is non-writable but is kept consistent with the data cache and extemal 
memory through extensive cache-coherence support. 


An on-chip data cache ensures the single-cyde fast execution of load and: 
store instructions that is critical to high-performance reduced instruction set 
computer (RISC) processors. The four-way set associative cache has 16K-by- 
tes of total storage. This cache enforces cache coherence with other caches 
in a system. The data cache is a physically addressed cache and, depending 
on the system environment, works in either write-through or copy-back mode. 
The behavior of the data cache is further explained in Chapter 10. 


The store buffer reduces the latency on store instructions. Its eight-entry FIFO 
queue holds the data until it can be written out to the external cache and/or 
memory. Each entry can hold the data from a single store instruction. This buff- 
ering allows the pipeline to continue execution, thereby increasing perform- 
ance. 


External Cache Support 


The MultiCache Controller (МХСС), SuperSPARC's optional extemal! cache 
controller chip, implements a large, directly mapped, physically addressed ex- 
ternal cache. The МХСС serves as a single-chip interface to the level-2 MBus 
standard or as an interface to XBus, a packet-switched bus allowing connec- 
tion to a variety of system buses. 
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External cache datais stored in fully pipelined cache RAMs. The SuperSPARC 
chips support SPARC’s total store ordering (TSO) and partial store ordering 
(PSO) memory models. See Chapter 8 and The SPARC Architecture Manual 
for further details. 


3.1.8 Multiprocessor Cache Coherence Support 


The SuperSPARC chips provide built-in multiprocessor cache coherence. The 
protocol supports multiple-cached copies of shared data. 


Bus snooping implements the coherence algorithms. All coherence protocols 
are based on physical addresses. 


See Chapter 16 for further details. 


3.1.9 Hardware Breakpoints 


On-chip hardware breakpoints with code and data access simplify software 
debugging and reduce system-development time. A single code or data ac- 
cess breakpoint can be set on a virtual or physical address. 


A 16-bit instruction counter and a 16-bit cycle counter debug and analyze per- 
formance. These counters can be used to generate breakpoints. 


When these breakpoints occur, they all have programmable actions. They can 
generate exceptions or interrupts or toggle an extemal pinto help trigger exter- 
nal analysis equipment. 


See Chapters 17, 18, and 19 for further details. 


3.1.10 JTAG Emulation 


Through the SuperSPARC processor's JTAG IEEE Р1149.1 asynchronous 
scan interface, the state of the processor can be viewed or modified without 
changing other processor states. The processor can be single-stepped 
through a program, and all processor states can be observed after each 
instruction group. This interface can also be used to view or modify registers 
or system memory. 


Only a JTAG control device with appropriate software is required to use this 
facility; no test pod or other specialized hardware is required. 


See Chapter 15 for further details. 
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3.1.11 Full Testability 
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SuperSPARC is designed to be a highly testable device. JTAG scan gives ac- 
cess to internal data paths and contro! logic for testing. Large internal arrays 
are not in the scan chain but can still be tested through the serial JTAG inter- 
face. An automatic power-up self-test can be initiated with or without any exter- 
nal scan hardware. This, along with functional testing of the arrays, assures 
that the device is operating. Boundary scan can be used to perform board-lev- 
el interconnect testing. 
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3.2 High-Performance Implementation Architecture 


In orderto push beyond the improvement from clock rate, the infrared architec- 
ture has been optimized to execute multiple instructions simultaneously and 
critical instructions quickly. These architectural features increase the average 
number of instructions executed per cycle by a factor of two for integer pro- 
grams. A greater improvement may be found for floating-point-bound pro- 
grams. 


SuperSPARC typically executes programs from its cache at about 1.4 to 1.6 
instructions per cycle (IPC), or about 0.7 to 0.6 clocks per instruction (CPI). 
This figure decreases to about 1.1 IPC for large programs not fully contained 
in the cache. Floating-point performance is generally higher. 


The major implementation architecture optimizations are outlined below. 


3.2.1 Multiple-Instructions-per-Cycle Execution 


SuperSPARC can issue up to three instructions simultaneously. Certain rules 
determine how many of the available instructions can be executed in any par- 
ticular cycle. These rules are fully described in Chapter 5. 


3.2.2 Fast Load and Store Instructions 


All load and store instructions operate in a single clock cycle when the refer- 
enced data is present in the on-chip data cache. This includes 64-bit transfers 
and floating-point transfers. When the data is not present in the on-chip data 
cache, a five-cycle penalty is imposed to access the extemal cache. Each 
cache miss reads a block (32 bytes) of data from the external cache. These 
bus transactions are fully pipelined. The processor can use this data as soon 
as it arrives from the bus. 


When using external cache memory, no miss penalty is incurred for normal 
store misses. An internal store buffer holds the store transaction and allows 
the pipeline to continue. 


The instruction immediately following a load may use the data without incur- 
ring any delay. There are some cases of interlocks between the load instruc- 
tion and the following address calculations (described in Section 5.4). The ex- 
temal bus is pipelined for high performance. It is capable of delivering data 
from different addresses on successive cycles. 


Because all of SuperSPARC's caches are physically addressed and are fully 
coherent, there is no need to flush cached entries, and virtual address aliasing 
conditions do not exist. Eliminating flushing overhead can boost performance 
significantly. 
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3.2.3 Floating-Point Implementation 
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The SuperSPARC FPU allows one floating-point operation and one memory 
referenceto be issued in every clock cycle. SuperSPARC supports single- and 
double-precision operations but not extended or quad-precision. The FPU 
maintains a four-entry deferred trap queue (FQ) from which FPops are ex- 
ecuted. Some operations require more execution cycles than others; for 
example, FDIV (floating-point divide) and FSQRT (floating-point square root) 
use many more cycles than FADD. SuperSPARC FPU also handles the inte- 
ger multiply and divide operations. Floating-point instructions are executed in 
the order in which they are issued by the processor, allowing no out-of-order 
completion. Register dependencies can delay the execution stream, and ex- 
ceptions can interrupt the pipeline, sometimes requiring instruction aborts. Su- 
perSPARC handles all cases of normalization and register alignments for 
double-precision arithmetic, directly in hardware. SuperSPARC does not gen- 
erate unfinished exceptions (unfinished FPop trap). 


Chapter 11 provides details about the FPU. 
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4.1 Integer Unit r Registers 


The SuperSPARC processor (SSP) provides eight register windows in its inte- 
ger unit. The r registers are divided into 128 window registers and eight global 
registers. Each register is 32 bits wide. The window registers are divided into 
eightsets of 16 registers. At any time an instruction can access the eight global 
registers and a 24-register window of rregisters. A register window comprises 
eight in registers and eight local registers of a particular register set, along with 
the eight inregisters of the adjacent register set—addressable as the window's 
out registers. See Figure 4-1. 


Figure 4—1. Windowed r Registers 





RESTORE 


RETT 


Instructions that access double-words in r registers require an aligned pair of 
registers. The least significant bit of an r register index in these instructions is 
reserved and should be set to zero. An attempt to use a double-word load or 
store to a misaligned (odd) destination register will cause an illegal instruction 
trap. 


Register Summary 
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integer Unit r Registers 


The current window into the r registers is given by the current window pointer 
(CWP) in the processor state register. The window indexing is as shown in 
Table 4-1. 


Table 4—1. Window Addressing 


r Register 
Index 





Windowed Register 
Index 
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4.2 Processor State Register 


The 32-bit processor state register (PSR) holds key status and control in- 
formation. The instructions that modify the PSR's fields include SAVE, RE- 
STORE, Ticc, RETT, and any instructions that modify the condition codes. The 
privileged instructions RDPSR and WRPSR read and write the PSR directly. 


See Figure 4-2. 
EF ET 
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Figure 4—2. Processor State Register 


The PSR contains the following fields: 


impi Implementation. Bits 24 through 31 contain SuperSPARC's ver- 
sion number. SuperSPARC’s implementation number is 0x40. 
The WRPSRinstruction does not affect the contents of this field. 


icc Integer Condition Codes. Bits 20 through 23 contain Super- 
SPARC's condition codes. These bits are modified by WRPSR 
and instructions ending in cc. Conditional branch and trap ` 
instructions base their contro! transfer on these bits, which are 
defined as shown in Figure 4-3. 


Figure 4–3. Integer Condition Codes (icc) 


23 22 21 20 


ісс.п Negative. Bit 23 indicates whether the 32-bit 2's complement 
arithmetic logic unit (ALU) result was negative for the last 
instruction that modified the iccfield. 1 = negative, 0 = not nega- 
tive. 

ісс.2 Zero. Bit 22 indicates whether the 32-bit ALU result was zero for 
the last icc-modifying instruction. 12 zero, 0 = nonzero. 


јесу Ометом. Bit 21, the overflow bit, indicates whether the АЦЈ re- 
sult was representable in 32-bit 2's complement notation for the 
last icc-modifying instruction. The overflow bit is also set if a 
tagged operation is performed on non-tagged operands. 1 = 
overflow, 0 = no overflow. 
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Carry. Bit 20 indicates whether a 2's complement carry (borrow) 
out of bit 31 resulted from the last icc-modifing addition (subtrac- 
tion). 1= carry, 0 = no carry. 


Reserved. Bits 14 through 19 are reserved. When read by a 
RDPSR, these bits retum zero. A WRPSR should write only 0'5 
to this field. 


Enable Coprocessor. Bit 13 determines whether a coprocessor 
is enabled. In SuperSPARC, this bit is permanently set to zero. 
Coprocessor instructions will cause a cp-disabled trap. If a 
WRPSR instruction attempts to set the EC bit, an illegal instruc- 
tion trap will be generated. 


Enable FPU. Bit 12 determines whether the FPU is enabled. 
When disabled, a floating-point instruction will cause a fp-dis- 
abled trap. 1 = enabled, 0 = disabled. 





Note: 


Software can use the EF bit to determine whether the floating-point unit 
(FPU) is used by a particular process. tf the FPU is unused by a process, the 
fregisters need not be saved across a context switch. 


PIL 


Processor Interrupt Level. Bits 8 (LSB) through 11 (MSB) deter- 
mine the external interrupt priority level above which Super- 
SPARC will accept external interrupts. 


Supervisor. Bit 7 determines whether the processoris in supervi- 
sor or user mode. 1 = supervisor mode, 0 = user mode. 


Previous Supervisor. Bit 6 contains the value of the S bit at the 
time of the most recent trap. 


Enable Traps. Bit 5 determines whether traps are enabled. A 
trap will automatically set ET to zero, disabling further traps. 
Whiletraps are disabled (ET « 0), interrupt requests are ignored, 
and an exception trap causes SuperSPARC to halt execution 
enter error mode and take a watchdog reset. 1 = traps enabled, 
0 = traps disabled. 


Current Window Pointer. Bits 0 (LSB) through 4 (MSB) contain 
the CWP. This is a pointer to the current active register window. 
The hardware increments the CWP on a RESTORE and RETT 
and decrements it on SAVE and trap. 
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4.3 Window Invaild Mask 


The Window Invalid Mask (WIM) register is controlled by supervisor software. 
Hardware then uses the WIM to determine which window(s) cause window 
overflow and underflow traps on SAVE, RESTORE, or RETT instructions. 


Each bit in the WIM register corresponds to a register window. WIM[n] corre- 
sponds to the register set addressed when CWP = n. If WIM[n] = 1, window 
nis marked invalid. Should the CWP be decremented into an invalid window 
by a SAVE instruction, a window, overflow trap is generated. Should a RE- 
STORE or RETT instruction increment the CWP into an invalid window, a win- 
dow. underflow trap is generated. | 


The WIM is read and written by the privileged instructions RDWIM and 
WRWIM, respectively. Bits 31-38, which correspond to unimplemented win- 
dows, are read as zeros and are unaffected by writes. See Figure 4-4. 


Figure 4—4. Window Invalid Mask 


31 
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44 Trap Base Register 


When a trap occurs, the program counter (PC) is loaded with the contents of 
the trap base register (TBR). (See Figure 4-5.) The TBR is a pointer into the 
trap table that contains the trap-handler address. The privileged instruction 
RDTBR reads the entire register. Bits 0 through 3 are zeros, and the WRTBR 
instruction should always be issued with zeros in this field. The TBR contains 
the following fields shown in Figure 4-5. 


Figure 4—5. Trap Base Register 


PTA Ct [0000 
31 11 3210 
TBA Trap Base Address. Bits 12 through 31 contain the 20-bit trap 


base address. This field is written by the privileged WRTBR 
instruction. The trap base address is usually established by su- 
pervisor software. 


tt Trap Type. Bits 4 through 11 contain the eight-bit trap type field. 
This eight-bit field is written by the hardware, based on the type 
of trap taken, and provides an offset into the trap table. The tt 
field retains its value until the next trap and is not affected by the 
WRPSR instruction. The tt field can be directly manipulated by 
Ticc instructions. 
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4.5 Multiply/Divide Register (Y) 


The 32-bit Yregister contains the most significant word of the double-precision 
product of an integer multiply using an SMUL, SMULcc, UMUL, UMULcc, or 
MULScc instruction. The Y register also holds the most significant word of the 
double-precision dividend of an integer divide using a SDIV, SDIVcc, UDIV, 
UDIVcc instruction. The Y register is read and written by the non-privileged 
RDY and WRY instructions, respectively. 
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4.6 Program Counters (PC, nPC) 


The 32-bit PC contains the address of the instruction currently being executed 
in SuperSPARC's integer unit. The next program counter (nPC) holds the ad- 
dress of the next instruction to be executed (assuming a trap does not occur). 


The nPC facilitates delayed contro! transfers. The delay instruction is 
executed (unless the contro! transfer annuls it) before contro! transfers to the 
target. During execution of the delay instruction, the nPC points to the target 
of the control transfer instruction. 


Both the PC and nPC are available in the local registers of the trap handler af- 
ter a trap. This allows the handler to choose between resuming the program 
execution from the trapping instruction or from the instruction following the 
trap-causing instruction. 
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4.7 Anciliary State Registers 
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SuperSPARC has no ancillary state registers, but encodes the STBAR and 
SIGM instructions as special cases of RDASR, the instruction for reading an- 
cillary state registers. (See Chapter 7.) The STBAR instruction is implemented 
inthe RDASR-reserved instruction space and is equivalentto reading ancillary 
state register (ASR) Ox0f (RDASR Ox0f, 9690). The SIGM instruction is imple- 
mented in the RDASR implementation-dependent extended opcode space as 
defined by Version 8 ofthe SPARC architecture. SIGM is equivalent to reading 
ASR 0x1f (RDASR Ox1f, 9690). 
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4.8 Floating-Point f Registers 


SuperSPARC has 32 32-bit floating point f registers, numbered from f [0] 
through f [31]. There is no windowing of the fregisters; floating-point instruc- 
tions have access to all 32 registers at all times. А single f register can hold 
one single-precision operand. A double-precision operand requires an aligned 
pair of fregisters. Thus the fregisters can hold a maximum of 32 single-preci- 
sion or 16 double-precision operands. Instructions that access a floating-point 
double in the fregisters assume double alignment, and the least-significant bit 
of a double-precision fregister specifier is reserved and should be set to zero. 
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4.9 Floating-Point State Register 


The 32-bit floating-point state register (FSR) fields contain mode and status 
information. The FSR is read and written by the STFSR and LDFSR, respec- 
tively. See Figure 4-6. 


Figure 4—6, Floating-Point State Register (FSR) 


| | 
| во[ији] тем | |ојо| ve | m | fuf tec} гес | 
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The FSR contains the following fields: 


RD Rounding Direction. Bits 30 and 31 selects an АМЅИЕЕЕ Stan- 
dard 754-1985 rounding direction for floating-point results. See 
Table 4-2. 


Table 4—2. Rounding Direction (RD) Field of FSR 


[RO] Round Toward: | 
[0 | Nearest (oven) | 
| 












u Unused. Bits 29, 28, and 12 are unused in the SuperSPARC im- 
plementation. To ensure future compatibility, software should al- 
ways issue a LDFSR with zeros in these bits. 


TEM Trap Enable Mask. Bits 23 through 27 represent the FPU's trap 
enable mask. Each of the five bits represents one of the five 
floating-point exceptions that can be indicated in the current ex- 
ception (cexc) field. If a floating-point operate (FPop) instruction 
generates one or more exceptions, and the TEM bit correspond- 
ing to one or more of the exceptions is 1, an fp exception trap 
is generated. A TEM value of zero prevents that exception type 
from generating a trap. See Figure 4-7. 
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Figure 4—7. Trap Enable Mask (TEM) Field of the FSR 
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Invalid Trap Mask. Bit 27 represents the invalid operation trap 
mask. An invalid operation exception will occur when an improp- 
er operand is supplied to a FPop instruction. For example 0 + 
0 is invalid. 0 = disable invalid operation trap, 1 = enable invalid 
Operation trap. 


Overflow Trap Mask. Bit 26 represents the overflow trap mask. 
An overflow exception will occur when the rounded result would 
be larger than the largest normalized number in the result for- 
mat. 0 = disable overflow trap, 1 = enable overflow trap. 


Underflow Trap Mask. Bit 25 represents the underflow trap 
mask. An underflow exception will occur when the rounded re- 
sultis inexact and would be smaller than the smallest normalized 
number in the result format. 0 = disable underflow trap, 1 = en- 
able underflow trap. | 


Divide By Zero Trap Mask. Bit 24 represents the divide by zero 
trap mask. A divide by zero exception will occur for а! X + 0, 
where X is normalized or subnormal but not zero (0 = 0 will not 
generate a divide-by-zero exception). 0 = disable divide by zero 
trap, 1 = enable divide by zero trap. 


Inexact Trap Mask. Bit 23 represents the inexact trap mask. An 
inexact exception will occur if the rounded result of a FPop 
instruction differs from the infinitely exact result. 0 = disable inex- 
act trap, 1 = enable inexact trap. 


Non-Standard Mode. Bit 22 represents non-standard mode 
execution. SuperSPARC ignores the NS bit; it can be read or 
written but has no effect on floating-point execution. Super- 
SPARC adheres to АМЅИЕЕЕ Standard 754-1985 and is unaf- 
fected by the NS bit's setting. 


Version. Bits 17 through 19 represent the FPU version. On Su- 
perSPARC, this field is always zero. 
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Floating-Point Trap Type. Bits 14 through 16 identify the floating- 
point exception trap type. The ftt field encodes the type of excep- 
tion that occurred until a STFSR or another FPop is executed. 
The ftt can be read by the STFSR instruction, but the LDFSR 
instruction does not affect the field. The exception types are en- 
coded as shown in Table 4-3. 


Table 4—3. Floating-Point Trap Type (ftt) Field of FSR 














| ft |___Ттартуре — 
| 000 | None 
Сют [ЕЕЕ 754 encapion | 
шо | unfinished рар 
[Gr  wimplementer Ро? 
[т [seems enr | 
тот | hardware пог 
| 110 | invalid_fp_register 

От | теле | 


Queue Not Empty. Bit 13 indicates whether SuperSPARC's four- 
entry floating-point queue is empty after a floating-point excep- 
tion or STDFO instruction has been issued. The qne bit can be 
read by the STFSR instruction but is unaffected by the LDFSR 
instruction. 0 = queue is empty, 1 = queue is not empty. 








Floating-Point Condition Codes. Bits 10 and 11 contain the float- 
ing-point condition codes. (See Table 4-4.) These bits are modi- 
fied by the FCMP and FCMPE floating-point compare instruc- 
tions. The fcc field can be read and written by the STFSR and 
LDFSR instructions, respectively. Floating-point branches 
(FBfcc) base their control transfer on this field. In the following 
table, the question mark (?) denotes an unordered relation, 
which is true if either fs; or f.» is signaling NaN. 


Table 4—4. Floating-Point Condition Codes (fcc) Field of FSR 
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aexc Accrued Exceptions. Bits 5 through 9 represent the accrued ex- 
ception field of the floating-point unit. (See Figure 4-8.) After an 
FPop completes, the TEM and the cexc fields are logically 
ANDed together. Should the result be nonzero, an fp exception 
trap is generated, or else the new cexcis logically ORed into the 
aexc field. While traps are masked by the TEM field, exceptions 
are accumulated in the aexc field. The bits in the aexc field as- 
sume the same definition as those in the TEM field. 


Figure 4—8. Accrued Exception Bits (aexc) Field of the FSR 


| nva | ofa | ша | ба | па | 
9 B 7 6 5 


cexc Current Exceptions. Bits 0 through 4 indicate that one or more 
IEEE 754 floating-point exceptions were generated by the most 
recently executed floating-point instruction. (See Figure 4—9.) 
The cexc field is automatically cleared by the execution of the 
next floating-point instruction. The bits in the cexc field assume 
the same definition as those in the TEM field. 


Figure 4—9. Current Exception Bits (cexc) Field of the FSR 


| vc | ofc | uic | ао | пк | 
4 3 2 1 0 
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4.10 Floating-Point Queue 


SuperSPARC's four-entry-deep floating-point queue contains enough in- 
formation to implement resumable, deferred floating-point traps. All FPops are 
written to the queue (FPops do not include memory references, FSR opera- 
tions, or the privileged STDFQ). The contents of the floating-point queue 
should only be stored in exception mode. A STDFQ when not in exception 
mode will cause SuperSPARC to hold the pipeline until all floating-point opera- 
tions have completed and subsequently store information regarding the last 
completed floating-point operation. 


An STDFQ will retum a double containing the virtual address and 32-bit op- 
code of the instruction at the front of the queue. The format is as shown in 
Figure 4-10. 
Figure 4—10. Floating-Point Queue Format 
address«2:0» 


0 32-bit virtual address program counter 
4 32-bit opcode 


31 0 
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Efficient use of the SuperSPARC processor’s (SSP's) pipeline is a major factor 
in system performance. This chapter describes SuperSPARC's pipeline op- 
eration and how to use it efficiently. Pipeline fundamentals are introduced, and 
the intricacies of SuperSPARC's pipeline are illustrated through diagrams and 
examples. Code generation strategies are also discussed, and guidelines are 
given to obtain maximum performance using SuperSPARC's superscalar 
pipeline. 
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Introduction 


This chapter will introduce instruction execution in an abstract framework. Un- 
derstanding the subtle details of the SuperSPARC processor's instruction 
execution aids in realizing the processor's full potential for high performance. 


A program's instructions are arranged in the order in which they will be visited 
by the program counter (PC). SPARC program order is defined by the SPARC 
Architecture. Most processor implementations, including SuperSPARC, do 
not actually perform all the steps of a program in program order, but must give 
every appearance of having done so. Two implementation techniques used 
inthe SuperSPARC processor, pipelined execution and superscalar 
execution, deviate from program order. In addition, SuperSPARC’s store buff- 
er can delay stores relative to loads from other addresses, and some memory 
systems may perform operations out of order. 


Since the programmer relies on program order when constructing the pro- 
gram, the processor must maintain the illusion of program order at all times. 
This is most difficult around traps. 


A program is executed for its effects, for the changes it makes to registers and 
memory, and for the I/O it performs. One technique used in SuperSPARC to 
exhibit program order is to speculatively execute instructions but cancel their 
effects before registers, memory, or I/O are affected in case the instructions 
are not needed. A cancelled instruction is called a squashed, or aborted, 
instruction. 


The execution of an instruction can be broken down into several steps, includ- 
ing fetching, decoding, Computing, and storing the results. Rather than 
completing all these steps for one instruction before starting the next, the steps 
for different instructions can be overlapped. The first step of each new 
instruction can be performed while the previous instruction is on the second 
Step and the second previous instruction is on the third step. In this manner, 
several instructions are being executed simultaneously, thereby reducing the 
average execution time per instruction. This technique is called pipelining. 


A pipelined instruction sometimes requires an input that is being computed by 
one of the several previous instructions that have begun execution but are not 
yet complete. In this case, the instruction might wait until all the required in- 
puts have been stored as results before beginning execution. A faster scheme 
is to issue an instruction as soon as it is certain all of its inputs will be available 
in time to be used. This is called data forwarding, or forwarding. For this to 
work, the processor must contain data paths to route the results of earlier 
instructions directly to the inputs of computations for later instructions. 
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Pipelining reduces the time required to execute a sequence of instructions, 
because instructions can be started more frequently. Computation-per-unit 
lime is measured as throughput. Pipelining improves throughput. However, 
pipelining does not improve the amount of time required to execute a single 
instruction. This time to produce an answer is called latency. 


A superscalarmicroprocessor can issue and execute two or more instructions 
in parallel. A superscalar design increases the amount of work a processor can 
репогт per cycle, thereby increasing its efficiency. Ideally, with N concurrent, 
simultaneous operations, the performance of a superscalar processor should 
exceed that of а scalar processor by a factor of N. Data dependencies, 
procedural dependencies, and resource conflicts, however, limit the 
magnitude of this performance boost. To reduce these effects, SuperSPARC 
has logic to dynamically schedule instructions and duplicate functional units. 
These features maximize SuperSPARO's capability of issuing three 
instructions per cyde. As a result, SuperSPARC's performance typically 
exceeds that of a scalar processor by 4096. 


SuperSPARC is a single-pipeline, three-way, dynamic superscalar micropro- 
cessor. It can issue up to three instructions in each clock cycle (three-way 
superscalar). It decides how many instructions to run by examining the next 
few instructions available, rather than by issuing groups of instructions in 
memory that had been previously marked by the compiler or the assembly lan- 
guage programmer. This is called dynamic grouping. Finally, the group of 
instructions chosen are executed together in lock-step; if one instruction of the 
group is delayed (for example, for a cache miss), the entire group waits. This 
is as if there were a single pipeline three instructions wide. This technique 
greatly simplifies trapping and error recovery. 
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5.2 Pipeline Fundamentals 


5.2.1 FO/F1 (Fetch) 


The SuperSPARC processor's pipeline consists of eight stages, which ex- 
ecute in four clock cycles. Each stage has unique functions that contribute to 
completing instructions in the group. Different types of instructions are sup- 
ported by different functions in several of the pipeline stages. The Super- 
SPARC pipeline stages are: 


Ро F1 DO D1 D2 EO E1 WB 


Each stage is described in the following sections. 


All instructions must be fetched before they are executed. However, not every 

instruction is fetched in the cycle immediately preceding its execution. An 

instruction may be prefetched and placed in the instruction queue. The fetch 

stages (FO/F1) of the pipeline manage the instruction queue, including fetch- 

ing and prefetching required instructions from memory. Not every fetched 

instruction is executed. Some instructions may be discarded if a control trans- 

fer instruction (branch) changes the flow of execution. Up to 128 bits (four in- 

structions) may be read from the instruction cache in every cycle. These in- 
structions enter into the instruction queue and can be removed at a maximum 

rate of three instructions per cycle. 


5.2.2 DO (Grouping) 


5-4 


The DO stage selects the first one, two, or three instructions from the instruc- 
tion queue to form a group. This selection depends on the set of instruction 
candidates available at the head of the instruction queue prefetch buffer, as 
well as the current state of the processor pipeline. The grouping rules used to 
form this selection are described in Section 6.6. These instructions must be 
taken in order from the queue. SuperSPARC does not execute instructions out 
of order. 


Once a group of instructions is selected, DO identifies the single memory refer- 
ence instruction in the group (if there is one) and latches the corresponding 
register index. DO forms extension words based on the immediate values for 
memory reference апа control transfer instructions' displacements. DO identi- 
fies cascade conditions (integer instruction data dependencies within and be- 
tween instruction groups) and inserts pipeline bubbles when necessary. A 
bubble is a cycle where no instruction is executed. This cycle is necessary 
when the required data is not available. 
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5.2.3 D1 (Resource Allocation) 


D1 assigns available resources within the integer unit to individual instructions 
in the group selected during DO. All cases of data forwarding (or bypass) are 
resolved in this stage. All operand register indexes are selected and assigned 
to individual register file ports. These resources remain constant throughout 
the execution of the instructions. 


The two address registers selected during DO are read ма two dedicated regis- 
ter file ports during D1. This data is used т D2 to compute a load or store virtual 
address. The data for these may also be forwarded from currently executing 
instruction groups. 


Branchtarget addresses are generated in D1, taken from the extension words 
selectedin DO and the PC value ofthe branch instruction within the group. Next 
PC (nPC) values are also generated. 


5.2.4 D2 (Read Operands) 


Stage D2's primary function is to read the operand registers selected іп the 
preceding D1 stage. In addition, the address operands read during D1 will be 
combined in the virtual address adder. The result is a 32-bit virtual address that 
will be used to reference the Memory Management Unit (MMU) and data 
cache in subsequent stages. During D2, any data forwarding paths required 
for execution will be set up to transfer data in cycles that follow. 


5.2.5 EO (Execute First Stage) 


The SSP has two execution stages. EO is the primary execution stage. Most 
arithmetic logic unit operations (ALUops) complete in EO. During EO, the data 
operands read from the register file during D2 are passed through one of two 
ALUS or the shitter. A maximum of two integer results can be generated in EO. 
Only one may be generated by the shifter. These results are then presented 
as input to the E1 cascaded ALU and sent into many forwarding paths. 


For memory references, the virtual address generated in D2 is used in EO to 
begin accessing the translation lookaside butter (TLB) and the data cache. 
Only the low-order 12 bits of the virtual address are needed to begin cache 
lookup. The high-order bits are supplied by the MMU in the E1 phase for tag 
comparison with the physically cached data. The MMU must inform the data 
cache unit in EO if there is an access exception in the current group of instruc- 
tions. The integer unit (IU) is also informed in E1 stage about the access ex- 
ception. If itis not yet known whether an exception must be reported to the cur- 
rent group (due to TLB or cache misses), the pipeline is stalled at this stage 
until all exception sources have been resolved. 
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Floating-point operations are dispatched to the floating-point unit (FPU) during 
EO. From this point forward, floating-point operations (FPops) execute in the 
FP pipeline explained in Section 5.5. 


5.26 E1 (Execute Second Stage) 


The second stage of execution can generate at most one additional integer 
ALU result, made up of one computed result from EO, plus one “pass-through” 
value, read or bypassed by D2 (no "double cascades" are allowed). This result 
is generated in the cascaded ALU. The computed results from the EO ALU or 
shifter are used as inputs to this ALU. АП execution results except FPops from 
the currentinstruction group are available by the end of the E1 stage, including 
data retumed from the data cache. Results generated in E1 are delayed a 
cycle before they can be used as address operands. Address dependencies 
for a load from memory result in one cycle of pipeline bubble. Condition codes 
generated in E1 are delayed a cycle before they can be used in resolving con- 
ditional branches. 


5.2.7 Write Back (WB) Results 


When stage E1 has completed, all non-FPop results are guaranteed to be 
available. The primary action in the WB pipeline stage is to write back these 
results into the register file. Only instructions that complete correctly with no 
prior exceptions are written back. The WB stage executes at the same time 
as the EO stage of the next instruction group. Forwarding paths are used to 
transmit data between successive groups. The integer unit updates the regis- 
ter file during WB, and the data cache normally updates its contents when an 
ST instruction has appeared in EO-E1. 


SuperSPARC canoperate either directly connected to MBus or with the Super- 
SPARC MultiCache Controller (MXCC). This choice has a major impact on the 
behavior of store instructions. When connected to the MXCC, SuperSPARC 
assumes the existence of an extemal cache, and the SuperSPARC data cache 
behaves as a write-through cache, which means that all store instructions that 
modify the intemal cache also write their data through to the extemal cache. 


When connected directly to MBus, SuperSPARC’s data cache operates as а 
copy-back cache. Store data will remain in the cache until either the line con- 
taining the data is replaced or a snoop on the bus forces a copy-back. The ac- 
tions taken on a snoop hit are further explained in Chapter 17. The cache also 
implements a write-allocate policy. Should a store miss in the cache, the block 
containing that data is brought in from memory. The store is then performed 
locally, and, consistent with the copy-back policy, memory is not updated. іп 
this configuration SuperSPARC does not assume the presence of an external 
cache. 
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5.3 Pipeline Example Overview 


The SuperSPARC processor's pipeline is straightforward for simple instruc- 
lion sequences, for example, ALUop. The complexity increases quickly for 
memory reference and control transfer instructions. The following sections de- 
scribethese cases in detail. Standard load and store sequences are presented 
first in Section 5.4, followed by floating-point operations (FPops) in Section 
5.5. SAVE, RESTORE, and all forms of contro! transfers are then described 
in Section 5.7 and Section 5.6. Section 5.8 describes how the pipeline deals 
with exceptions. 


Figure 5-1 describes how the pipeline works in simple cases. There are no 
pipeline stalls or bubbles that can be caused in a variety of ways. These will 
be dealt with later in this chapter. Figure 5-1 is similar to the pipeline diagrams 
used throughout the chapter to describe the operations of the processor. 


Figure 5—1. Basic Pipeline Description 


All pipeline stages are identified. In general, the contents of the instruction 
group will be indicated in the left-side heading. Significant operations and in- 
teractions are included in the boxes for individual stages. 


Notice how the groups overlap in time. While executing EO of instruction group 
two, WB of instruction group one and D1 of instruction group three are execut- 
ing. 
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5.4 Memory References 


Load and store instructions are frequent operations in SPARC programs. In a 
typical program, as many as 3096 of the instructions are loads or stores. Since 
SuperSPARC executes up to three instructions per cycle, it may be required 
to execute a memory reference nearly every cycle. Such a task stretches the 
SuperSPARC processor design a great deal. 


To maximize performance, SuperSPARC has removed restrictions associated 
with prior RISC designs. In particular, many sources of interlocks on load in- 
structions have been removed. This allows SuperSPARC to execute a Load 
instruction, immediately followed by a dependent ALUop (with a register de- 
pendency on the load) in the next instruction group. 


АЦО and ST instructions that hit in the internal data cache execute in a single 
cycle. This includes all byte, half-word, word, and double-word references. Up 
to two other instructions may be included in the instruction along group with 
the memory reference. Stores are generally buffered. When SuperSPARC is 
used with МХСС, stores take a single cycle to execute, regardless of whether 
the stores hit in the cache. 


5.41 Load Operation 


Example 5-1 shows an LD instruction surrounded by arithmetic operations. 
For simplicity, the sequence uses single-instruction groups, forced by the de- 
pendencies in the code. The code sequence being executed demonstrates the 
use of many data forwarding paths. 


Example 5—1.Simple Load Operation 
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add 510,511,%12 

!---Split (can’t cascade into shifter) 
Bll %12,2,%12 

!---Split (address dependency) 


ld (%12+0x10),%13 
!---Split (Load data dependency) 
add $13,%14,%15 





The execution of this code sequence through the pipeline is shown in 
Figure 5-2. 
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Figure 5-2. Basic Load Pipeline Sequence (Example 5—1) 


Clock 


Group One: ко 
add %10,%11,%12 


Group Two: 
all 7712.2, 7012 


Group Three: 
14 [%1240x10),%13 


Group Four: 
add 97513,7014, 7014 


Load after ALUOP 
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The add and shift instructions execute in EO of group one and group two, 
respectively. The instructions pass data through forwarding paths from the ада 
result into the shifter, and then from the shifter result in the virtual address ad- 
der for the load. The 0x10 offset is extended into a 32-bit value in the D1 stage. 
The offset extension word and the forwarded version of register %l2 are add- 
ed, and the result is passed to both the data cache and the MMU, which are 
accessed in parallel. When a hitis identified in ће TLB, the physical page num- 
ber is extracted and passed on to the data cache. In the meantime, the cache 
has completed reading all data and tags for the four lines in the set of the 
memory location (four-way set associative cache). The tags are compared 
with the physical page number from the MMU. When the proper line is identi- 
fied, the appropriate bytes become the result of the load instruction. The load 
result is also forwarded into the next EO execute stage for the last add instruc- 
tion, and it is written into the register file in the WB stage. 


Ка cache or MMU miss had occurred, pipeline bubbles would have been in- 
serted. The EO and E1 stages of the load would have been repeated until all 
the misses were satisfied. If any errors had occurred, they would have been 
reported to the EO stage (while it was being repeatedly executed). 


Note the use of a dependent shift instruction to force a split between the add 
and shift. If the shift were replaced by an add, it would be considered a cas- 
caded instruction, and the two would be grouped together. This would result 
in & pipeline bubble between the second add and the load, as shown in 
Example 5-2. The total execution time would be identical. 
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Example 5-2.Load Operation With Bubble 


add $10,%11,%12 

add %12,2,%12 !Cascade 
!---Split (address dependency) 
!bubble 


!---Split (address dependency) 
ld (%12+0x10),%13 

!---Split (Load data dependency) 
ада 513,%14,%15 





5.4.2 Store Operation 


Store operations are similarto loads in many ways. The address computation, 
cache lookup, and MMU access are performed in the same manner as loads. 
The primary difference is the handling of the data. Example 5-3 demonstrates 
a store operation. To illustrate multiple instruction handling, the instruction se- 
quence is more complex than the previous example. 


Example 5-3.Store Operation 
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add $10, 511,212 

sub 501, 502, 503 
{---Split (two write ports) 
and 503, $04, $05 

st $05, [%12+%13] 


!---Split (only one memory reference) 
ld [51240x10], $13 

!---Split (load data dependency) 

add $13, %14,%15 





Note that the store in Example 5-3 requires data from the AND. This is ex- 
ecuted with no delay, as the data to be stored is not required until the WB stage 
when it is actually written to the cache and/or memory. 


Figure 5-3is very similar to Example 5-2, with the addition of another memory 
reference instruction. Many of the forwarding paths used in the previous dia- 
gram are no longer used, while others are illustrated. The store address is 
computed in the D2 stage, then used to check the MMU and cache tags in 
EO/E1. Assuming all protection checks pass, the write is actually performed 
during WB. 
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Figure 5—3. Store Pipeline Operation 
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Group One: FO 


add %10,%11,%12 
sub %01,%02,%03 


Group Two: 
and 


%03,%04,%05 
st ф05,[%12+%13] 


Group Three: 
ld [76124-0x10], 2513 


Group Four: 
add %13,%14,%15 


Кога Ава Write 
юп Walt 12,03 
01,02 sub 

01,02 


Reed Add Write 
B 3+4 15 
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Read 


Operation of the data cache on an ST miss depends on whether SuperSPARC 
is operating directly on the MBus or is connected to the MultiCache Controller. 
When SuperSPARC is working with MXCC, if the store operation had missed 
the data cache, the timing would have been identical. The store data would 
have been written only to the store buffer and not to the data cache. In this 
case, SuperSPARC’s data cache does not write-allocate on misses. For Su- 
perSPARC directly on the MBus, however, if the store operation had missed 
the data cache, SuperSPARC’s data cache would have brought the data from 
memory, retried the store (which would have hit by now), and then written the 
new data into the cache line. This process is called write allocation, and this 
new data would be copied backto memory when the cache line was replaced. 
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In some cases, a load needs to read recently written data. If this data is in the 
cache, it is retumed immediately. For a SuperSPARC processor with MXCC, 
this data may not be in the cache, if the store missed in the intemal data cache 
and the data are still in transit to the external cache or to main memory. The 
drain rate of the store buffer depends on the external system. In such cases, 
the store buffer is checked for a copy of the needed data; this is called “store 
buffer snooping." If the data is present, the processor requests that the store 
buffer be drained; this is called "store buffer copy-out." The processor waits 
until the requested datais no longer in the store buffer, and then reads the data 
back from memory, as in a normal load. SuperSPARC does not forward data 
fromthe store buffer to satisfy read requests. For SuperSPARC directly on the 
MBus, a store guarantees that the data cache has the new data. See Chapter 
17 for more details. 
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5.5 Floating-Point Plpeline 


Floating-point instructions utilize both the integer and floating-point units. Su- 
perSPARC distinguishes between two types of floating-point instructions: 


[1 FPevs 


FPevs (floating-point events) include loads and stores into floating-point 
registers, are channeled directly from the integer unit to the floating-point 
registers. 


С FPops 


FPops are placed in a four-entry-deep floating-point queue by the integer 
unit. 


The floating-point unit reads the first instruction in this queue, performs the op- 
eration requested, and stores the results in its registers. 


The four-entry floating-point queue enables SuperSPARC's IU to continue 
executing integer instructions while the floating-point unit takes care of the 
floating-point operations independently. If the integer unit tries to issue a float- 
ing-point operation to a full floating-point queue, the integer pipeline will stall 
until an entry is cleared from the queue. 


if a floating-point operation generates an exception, the floating-point unit will 
halt operation. The integer unit will be alerted to the trap on its next floating- 
point instruction issue. The floating-point exception trap handler should then 
store the contents of the queue before performing the trap. Each entry in the 
queue comprises a 32-bit FPop and its 32-bit virtual address. Once the trap 
is completed, this address enables the trap handler to reissue the stored 
instructions into the queue. In this fashion SuperSPARC is able to implement 
deferred floating-point traps. The floating-point queue is also called the float- 
ing-point deferred trap queue. 


SuperSPARC's floating point pipeline is loosely coupled to the integer pipeline. 
A floating-point operation may be started every cycle. The latency of most 
floating-point instructions is three cycles. The FPU pipeline has the stages 
shown below. They are Decode, Read, Execute, Last, and Write-back. 
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Floating-point instructions are dispatched relatively late in the processor pipe- 
line. They are not issued to the FPU until the EO stage of the integer pipeline. 
Once issued, the floating-point instructions proceed through the FPU's pipe- 
line, with FD occurring simultaneously with E1 in the integer pipeline, as in 
Figure 5-4. Forwarding paths are provided to chain the result of one floating- 
point operation to a source of a subsequent operation. 


Figure 5-4. Floating Point Pipeline 


Time —— ———— —————» 
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In several cases, the floating-point pipeline becomes visible to the integer 
pipeline. When abranch on floating-point condition codes (ЕВісс) instruction 
is issued, the processor may need to wait until a preceding FCMP instruction 
has completed. When a floating-point store instruction is executed, the pipe- 
line also becomes visible. The integer pipeline waits in the EO/E1 stage until 
the requested data is available from the FPU. 


in some cases, the floating-point queue may become full, typically when many 
long-latency floating-point instructions (e.g., divide and square root) or highly 
dependent operations are issued. If the floating-point queue becomes full, the 
integer pipeline will stall in EO and wait for a queue entry to be freed by the 
completion of a FPop. 


Since floating-point instructions are issued late in the pipeline (EO), and the ac- 
tual arithmetic is not begun until one cycle later (WB), SuperSPARC may issue 
а load and a dependent FPop simultaneously. This is demonstrated in 
Example 5-4. 
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Example 5-4. Floating-Point Pipeline 


ldd [510], 5 2 

faddd %£2,%f0,%f6 

ада $10, 0х8, 510 

!--- Split (Three instruction max) 
таа [510], %#4 

add %10,0х4, %10 

fmuld %f4,%f0,%f£8 

{--- Split (Three instruction max) 
ldd [510+4], $£10 

cmp %10, 0x100 

be Loop 

'--- Split (Branch, Three instructions) 
faddd 5+6, +28, 570 





Example 5-4 shows many of SuperSPARC's strengths. All but the іаѕї group 
contain three instructions. The floating-point operations are grouped with load 
instructions that they depend on. The data returns from SuperSPARC's data 
cache at the end of the E1 stage; it is immediately used by the FPU's FRD 
stage and then by its FE stage. 


Figure 5-5 shows some of the code in Example 5-5 being executed. 


Example 5-5.A Floating Point Pipeline Example 


lda [510-511], $f2 
faddd %£2,%f0,%f4 


i--- Break 
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Figure 5—5. Floating-Point Pipeline Diagram 
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5.5.1 Floating-Point Instructions 


There are two types of floating-point instructions: 
Г] FPop (floating-point operations). 
С FPev (floating-point events). 


The FPevs (floating-point events) are executed by the IU and FPU together, 
but they do not enter the floating-point queue (FQ); these include: 


С] LDF/STF (load and store to floating-point registers). 

(О LDFSR/STFSR (load and store to floating-point status register). 
Q STDFQ (store floating-point queue). 

{J IMUL/IDIV (integer multiply and integer divide operations). 


The FPops include all FBfcc instructions, floating-point arithmetic instructions, 
and all floating-point moves, compares, and converts. These instructions all 
specifically enter the floating-point queue. 


5.5.2 Floating-Point Queue 
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The FQ is a FIFO queue that holds a maximum of four FPops. Should the 
queue become full, the integer pipeline of a group containing a FPop will stall 
until an entry clears. FSR.qne is reset (equals zero) in the FPU status register 
to indicate that the queue is empty. FPevs do not enter the floating-point 
queue. 
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The contents of the FQ should be stored when the FPU goes into exception 
mode. This enables SuperSPARC to implement resumable, deferred floating- 
point exceptions. Any exception will remain pending until another floating- 
point operation is requested. 


IDIV and IMUL can execute only in exception mode or when the FQ is empty. 


Normal floating-point operations will not resume until the integer multiply or di- 
vide completes. 


5.5.3 Floating-Point Execution Times 


Most floating-point instructions have a three-cycle latency. More complicated 
instructions take longer; see Section 11.3 for more details. 
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5.6 Conditional Branches In the Pipeline 
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The handling of branches is critical to a reduced instruction set computer 
(RISC) processor's performance. SuperSPARO's branch implementation can 
execute both taken and untaken branches efficiently. 


Untaken branches have a slight performance edge. The peak performance 
through an untaken branch is 3 instructions per cycle, while peak performance 
through a taken branch is 2.3 instructions per cycle. The difference comes 
from the delayed branch instruction, which, in the taken case, must be ex- 
ecuted as a single-instruction group. Due to usual code scheduling restric- 
tions, this peak performance will not generally be attained, and, in practice, the 
difference between taken and untaken branches is less. 


SuperSPARC implements branch prediction; it always attempts to fetch the 
target of the branch. The pipeline assumes, however, that the branch is unta- 
ken, and it relies on the instruction queue having several sequential instruc- 
tions available. The instructions at the branch target are fetched into a special 
buffer, the branch target queue. Once the direction of the branch has been re- 
solved, the appropriate instructions are executed, either from the prefetch 
queue or the branch target queue. 


When a branch instruction is encountered, the pipeline continues to execute 
normally for one cycle. In this additional cycle, the delay instruction is ex- 
ecuted, possibly along with other instructions in the untaken stream. If the 
branch is taken, these instructions from the untaken stream should not have 
begun execution. SuperSPARC terminates the execution of these instructions 
for a taken branch as soon as the branch sequence has been resolved. This 
is called aborting the instructions. These instructions must be canceled before 
any machine state is modified. 


A compare followed by a branch is a typical branch sequence. SuperSPARC 
executes this typical branch sequence by grouping together the branch in- 
struction and one or more previous instruction(s). The delay instruction is ex- 
ecuted in the next group. A typical instruction sequence might be as in 
Example 5-6 is a typical instruction sequence. 


Principles of Operation 


Subject to Change Without Notice 





Conditional Branches in the Ріреій 


Example 5-6.Branch Sequence 


%10,0x100, $g0 
TaknTarg 

$11, %12,%13 
%13, [%g1+0x100) 
$00, %01, £02 


TaknTarg: %90, [591+0х100] 





This same code sequence can demonstrate both taken and untaken 
branches. The same control transfer mechanism can illustrate all forms of 
branches, jumps, and calls. 


5.6.1 Untaken Branch 


the sequence from Example 5-6 is shown in Example 5-7 with grouping for 
the untaken branch case. 


Example 5—7.Untaken Branch 


subcc 310, 0х100, 530 

bz TakenTarg 
!---Split (Split after CTI) 

add $11,%12,%13 

st 213, [%91+0х100) 


!---Split (No more write ports) 
and 800, 501, $02 


TaknTarg: &g0, [$9140x100] 





The instructions at the label TakenTarg will not be executed, since the branch 
is not taken. Notice that two instructions (add, store) are executed in the 
delayed instruction group. The instructions beyond the third group are not sig- 
nificant for the example. 
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Figure 5-6 shows the fetch performed because the branch is assumed to be 
taken. The condition codes (from the subcc instruction) are resolved in the 
branch group's E0 pipeline stage. This condition code is used immediately to 
determine the branch direction in the delay group's D2 stage, the target 
group's DO stage (by selecting the correct queue), and the following group's 
FO stage (by selecting the correct prefetch program counter). Thus the branch 
is resolved after one cycle of uncertainty. Fetches once again follow the correct 
execution stream. 


Figure 5-6. Untaken Branch Pipeline 
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5.6.2 Taken Branch 


The taken branch case is more complicated. The most significant change is 
that instructions that begin execution in the delayed instruction group are 
aborted and never complete execution. Aborted instructions are boxed in the 
pipeline diagrams. The apparent grouping of instructions in the taken case is 
shown in Example 5-8. 


5-20 Principles of Operation 


Subject to Change Without Notice 





Conditional Branches in the Pipeline 


Example 5-8. Taken Branch Case Instruction Grouping 


subcc %10, 0х100, %q0 
bz TaknTarg 
!---Split (Split after CTI) 


add %11,%12,%13 
!---Split (abort st%13, [$g140x100]) 


TaknTarg: st 540, [*91+0х100] 





The instruction grouping is the same as for the untaken branch case, upto the 
delay group. In the delay group, the grouping begins the same as for an unta- 
ken branch, butthe grouping is modified by aborting the store instruction when 
the branch is resolved as taken. The taken branch pipeline is shown in 
Figure 5-7. 


Figure 5—7. Taken Branch Pipeline 


Taken Conditional Branch 
lnlicates aborted à 
ee ла — 





Clock 
Group One 
subec 
5910. 0x100,7» g0 
bz TakenTarg 
Group 
Wold 
st 
7613, (% 81+0хЮ0] 
Group Three: ко Fi Do Di 02 El WB 
and %00,%01,%02 Fetch at Select Read Add Read НИ! Write 
(Others....) Taken- uga gi 01+100 tb Allow 
sen. Targ qpa А сафе write Бо бег 
Extend. 
Q00 
Group Four: Fo Fi Do Di 02 Е1 УВ 
(don't care) Continuo 
ceca ce at 
branch target 


5.6.3 Branch Couple 


SPARC allows for a limited set of Control Transfer Instruction (CTI ) couples. 
They execute similarly to normal branches. SuperSPARC executes all the le- 
gal CTI couples correctly and, in conformance with SPARC V.8, does not sup- 
port the implementation-dependent cases in the SPARC V.8 Specification. 
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Conditional Branches in the Pipeline 


The JMPL instruction is an unconditional branch that uses a register as the 
source for the branch target. Other branch instructions compute the target ad- 
dress from an immediate operand and the current program counter value. 
JMPL requires an extra cycle because it has to read the register file before is- 
suing a branch target fetch. The extra cycle is injected into the pipeline after 
the delay instruction of the JMPL (i.e., before execution of the target instruc- 
tion). 
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5.7 Procedure Call and Return 


SPARC defines primitive operations for calling procedures and returning from 
them under a variety of circumstances and language models. The CALL 
instruction is a primitive that branches to a subroutine, saving the value of the 
PCin aregister. The JMPL instruction branches to an address in a register and 
can be used for subroutine retum. 


Register windows are managed with the SAVE and RESTORE instruction. 
SAVE allocates a new register window, saving the previous one. The RE- 
STORE instruction deallocates a register window and restores the previous 
window. SAVE and RESTORE can be used in the procedure call and retum 
sequences, respectively. Both instructions also perform an addition that can 
be used for other functions in the calling and retuming sequences, such as 
updating the stack frame pointer. 


571 CALL, SAVE and RESTORE 


CALL executes in the same manner as a taken branch, including execution of 
the delay instruction, except that the current PC value is written into register 
r15 (9607). The value is written in the EO stage of the group containing the 
CALL instruction. CALL is grouped with previous instructions in the same man- 
ner as branches. It does, however, require a register file write port, and so the 
grouping may be somewhat constrained by available resources. 


SAVE and RESTORE are each a single instruction groups. Each is identical 
to an ADD instruction, except for the changing of CWP during the decode 
phase. SAVE and RESTORE require single cycles to execute. 


5.7.2 Current Window Pointer (CWP) Pipeline 


The SAVE instruction decrements the CWP by 1, while the RESTORE instruc- 
tions increments the processor's CWP by 1. These instructions also perform 
an add operation. The sources for this add are from the old register window, 
while the destination is in the new window. For simplicity, SuperSPARC ex- 
ecutes both these instructions as singie-instruction groups. To make instruc- 
tions before and after SAVE and RESTORE operations reference the correct 
windows, multiple current window pointers (CWPs) are maintained in the pipe- 
line. Depending on an instruction's position relative to the CWP change, the 
appropriate CWP value is chosen. 


f —————————————————————————————À4 


Note: 


Any manual modification of the CWP (using WRPSR) should be made in ac- 
cordance with The SPARC Architecture Manualdelayed write requirements. 
Several cycles may be required for а CWP update to take effect. 
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5.8 Exceptions and the Pipeline 


Exceptions in the pipeline must ensure that all instructions before the excep- 
tion complete while the instruction causing the exception and those after it 
(both in its group and in the following groups) do not. 


In Example 5-9 the two arithmetic instructions form a cascade. The LDDF in- 
struction is betweenthe two arithmetic instructions. Also, the destination regis- 
ters of the two arithmetic instructions are identical. The problem arises in this 
caseif the LDDF instruction induces an exception due to a page-protection vio- 
lation or a misaligned memory address. In this case, the first instruction (ADD) 
must complete and write back its result. The second instruction (LDD), howev- 
er, must not complete. 


Example 5-9.Рага! Execution of a Group of Instructions 


!--- Split (Start of group) 
add %11,4%12,%13 
1да [500+0х100], 522 


and %13,0х20, %13 
!--- Split (3 instructions) 





If the LDDF instruction had completed without incurring an exception, the re- 
sult of the ADD would not have been written to the register file—it would simply 
have been used as a temporary value passed on to the AND. When the excep- 
tion occurs, this is no longer true, and the result must be written to the register 
file. 


5.8.1 Exception Pipeline 
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Example 5-9, with some surrounding instructions, is used in Figure 5-8 to il- 
lustrate the exception-handling pipeline. In Figure 5-8 an exception is re- 
ported for the LDDF instruction. The exception forces the ADD instruction to 
write its result and prevents the AND from doing so. The exception-handling 
logic then takes control of the pipeline. All subsequent instructions in the pipe- 
line are aborted. The store buffer is copied out to memory, and the pipeline 
Stalls until the copy-out is complete. In the next cycles, the PSR and CWP are 
modified to reflect the exception state of the processor. An instruction fetch to 
the proper interrupt handler in the trap table is requested. The exception PC 
and nPC are written into r17 and r18 (911,92). Finally, the instructions in the 
handler begin execution. 
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Figure 5—8. Exception-Handling Pipeline 
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Interrupts are handled in the same manner as exceptions. The interrupt re- 
quest pins are sampled on the rising edge of the clock. In the next cycle, the 
highest priority interrupt is selected. This may be from either the interrupt re- 
quest pins or from an intemally generated interrupt (breakpoints). The request 
level of the interrupt is compared against the enable trap field and processor 
interrupt level field in the PSR (PSR.ET and PSR.PIL). If the interrupt request 
level (IRL) is higher than PSR.PIL, the instruction group at the EO pipeline 
stage will, in the next cycle, trap to the interrupt handler. 


For the interrupt to be accepted, there must be a valid instruction in the EO 
stage. If there is no valid instruction, the interrupt is not taken until an instruc- 
tion arrives at EO. This ensures that there is a valid PC to report as the inter- 
rupted program counter. 


in a multiple-instruction group, the interrupt is reported to the last valid instruc- 
tion in the group. This instruction is then aborted. Breakpoints are reported not 
to the last valid instruction but to the instruction that caused the breakpoint 
detection. The interrupt pipeline is identical to the exception pipeline (see - 
Figure 5-8). It behaves just as if an exception had been reported to the last 
valid instruction in the EO group. 
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5.8.8 Return From Trap (RETT) Pipeline 


The return from trap instruction executes in cooperation with the JMPL that 
must immediately precede it. (See The SPARC Architecture Manual). Execu- 
tion of a RETT is similar to а JMPL but also updates current window pointer 
(CWP) and restores S and ET. The JMPL returns to the delay instruction, and 
the RETT returns to the target of the branch. If there was no previous branch, 
NPC is set to PC+4. 


A RETT executes as a single instruction group. It incurs no additional pipeline 
cycles. The preceding JMPL introduces a bubble into the pipeline after the 
RETT. 


Thetwo-instruction pair is needed because the trapping instruction mighthave 
been the delay instruction of a delayed control transfer operation (DCTI). In 
that case, ће NPC would not be PC+4 and would have to be restored from the 
register containing the saved nPC. 


Example 5-10 illustrates the usual code sequence used to return from a trap. 


Example 5-10. Return From Trap 
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jumpl %r17,%g0 


rett *r18 





The instructions at the target of the JMPL are executed as a single instruction 
group, since it may not be contiguous with the retum address in the RETT. The 
pipeline operation for the sequence is shown in Figure 5-9. 
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Figure 5—9. Return From Trap Pipeline 
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The performance available from the SuperSPARC processor (SSP) can vary 
significantly with the actual instruction sequence of the program. This chapter 
presents a number of guidelines that assembly language programmers and 
compiler code generators can use to increase the performance of programs 


on the SSP. The rules used by the SSP to group instructions are also pres- 
ented. 


Topic Page 


: ‘Spreading the Use ot Саса! Resources -. o ad 
a Code-Generation Guidelines . EE a 
k: : Instruction Grouping . гө эө е Seton ^*^ nae а MS чу, TUR 22646 





6-1 


Subject to Change Without Notice 





Performance Limiters 


6.1 Performance Limiters 
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The performance of a program running on the SSP depends on how the pro- 
gram's instructions are formed into groups for superscalar processing. The or- 
dering of the instructions in the program can greatly impact the way they are 
grouped and thus the number of cycles used to complete the program. The 
compiler code generator or assembly language programmer can plan or 
schedule the instructions to make the best use of the SSP's internal resources 
and achieve greater performance. 


The performance of programs onthe SSP is sensitive to many factors. The sig- 
nificant performance limiters vary greatly between programs. Classes of limit- 
ers include: 


[] Branch frequency and direction. 

O Memory-reference patterns. 

(а Floating-point operation scheduling. 
Q Instruction ordering. 

о Data and register dependencies. 


In floating-point programs, branch performance is rarely a limiter. This is be- 
cause loops are often unrolled. In most cases, floating-point programs are 
memory-reference limited rather than arithmetic-limited. Since only a single 
memory reference can be executed per instruction group, interleaving 
memory references with nearly anything else improves performance. 


Most programs are limited by branch performance or load/store bandwidth. In 
particular, a taken branch can execute only a single instruction in the branch 
delay group. Load and store operations are often bunched together. This pre- 
vents other instructions from being executed in parallel with the memory refer- 
ences. 


Cache performance is also critical to achieving maximum performance. Many 
routines are small enough for their entire code and data sets to be contained 
in the SSP's on-chip caches. Cold-start penalties, which are usually insignifi- 
cant, will degrade the performance of such routines. Most larger programs, 
however, do not fit entirely in the on-chip caches. These programs typically 
lose 2096 to 3096 of their performance to cache miss penalties. Prefetch logic, 
in some cases, will limit the decrease in performance. Localizing code and 
data references within a program can lead to substantial performance im- 
provements. 
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6.2 Performance of Existing Code 


The SSP is designed to execute code from older SPARC compilers efficiently. 
in general, the SSP executes this code at between 1.4 and 1.6 instructions per 
cycle (IPC), or 0.6 to 0.7 cycles per instruction (CPI). This decreases to about 
1.1 IPC for large programs not fully contained in the cache. The performance 
of floating-point programs is often greater. 


This performance level can be increased significantly with code generated ex- 
plicitly for the SSP. The remainder of this chapter considers low-level code- 
scheduling issues when generating code to run on the SSP. 
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6.3 Resource Allocation 


The SSP must group instructions according to available hardware resources. 
The hardware resources and restrictions are described in Figure 6-1. 


Figure 6—1. SuperSPARC Processor Hardware Resources and Restrictions 
ALUop MEMop BRop FPop 


Read Ports Write Ports 
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6.3.1 Ports to Memory 


The most basic hardware resource limitation is a single port to memory. This 
is what restricts the SSP to one load or store per cycle. Similarly, a single port 
to the instruction cache restricts branch performance. 


6.3.2 Ports to the FPU 


Although the SSP has an independent floating-point adder and multiplier, only 
one floating-point operation per cycle can be dispatched. A load or store be- 
tween a floating-point register and memory, however, can be done in the same 
cycle as a floating-point operation. 


6.3.3 Integer Register Write Ports 


Two write ports to the integer register file are available. All arithmetic instruc- 
tions use one of these ports. Load instructions use one write port, and load 
double instructions use both write ports. Note that floating-point load opera- 
tions do not use any integer register write ports. Example 6-1 is a code seg- 
ment that loads double floating-point registers and executes in two cycles. 
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Example 6—1.Integer Register Write Ports—Double Floating-Point Registers 


add $10, %11,%12 

and *$12,*$0xff,*13 

ldd [500+0х100], 522 

!|--- split (Three instructions) 


add 514,%15,%16 

апа %16, ОхЕЕ, %17 

ldd [500+0х108], $f4 

!--- split (Three instructions) 





A write port is used when the 9690 register is used as a destination because 
the condition codes must be carried along with the instruction. A write port is 
also used by a store instruction. 


Example 6-2 is similar to Example 6-1 but loads into integer registers and 
executes in four cycles. 


Example 6-2. Integer Register Write Ports—integer Registers 


add $10,511,212 

and $12, %0xff, $13 

!--- split (No more write ports) 
lda [%00+0х100], 12 

!--- split (No more write ports) 


add $14,%15,%16 

and 816, Oxff, 517 

!--- split (No more write ports) 
ldd [$00*0x108],*i4 





Integer Arithmetic Units 


The SSP can perform up to two arithmetic logic unit operations (ALUops) in 
an instruction group. This is implemented intemally with three separate ALUs 
anda shifter. Each of these can produce one result per cycle. One of the arith- 
metic logic units (ALUS) can use the output of either the other ALU or the shifter 
as its input. 

The number of ALUS cannot limit the machine's performance, since no more 
than two results can be stored by the register file. Only a single shifter exists, 
however, which means that a result cannot be cascaded into a shift operation. 


The shift result is produced early enough for it to be the source of a cascaded 
operation. This allows for such common combinations of operations as 
shift&add and shift&compare. 
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6.4 Spreading the Use of Critical Resources 


A single simple guideline can assist greatly in generating code that performs 
well on the SSP: interleave as many different classes of operations as pos- 
sible, spreading members of the same dass as far as possible from each other. 
A simple measure of the minimum execution time attainable for a routine is 
shown in Equation 6-1. 


Equation 6-1. Minimum Execution Cycles 


Memory References 
Floating-Point Operations 


Min Cycles = Мах“ педер Operations 


Branch Operations x 2 


The rule holds when one, and possibly two, of the terms are close to the maxi- 
mum. As more terms get larger, the likelihood of achieving or approaching the 
minimum number of cycles decreases. 


In order to approach the minimum above, all of the classes of operations must 
be interleaved as much as possible. in the worst case, the maximum number 
of execution cycles is shown in Equation 6-2. 


Equation 6—2. Maximum Execution Cycles 


MaxCycles 
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= Memory References + Floating-Point Operations 


+ ( 


integer Operations + (Branch Operations x 2) 
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6.5 Code-Generation Guidelines 


The SSP's dynamic grouping logic compensates locally for minor scheduling 
variations. The sequence in Example 6-3 executes in the same number of 
cycles as the sequence in Example 6-4. 


Example 6–3. Соде Scheduling Example 1 


ld [310], $g1 
1--- split (only one memory reference) 
ld [311], $g4 


add 581, 0х100, %83 
{=-~ split 





Example 6—4. Code Scheduling Example 2 


ld [510], 591 
!--- split (dependent Ld-Use) 


add 541, 0х100, %93 
1а [211], %94 





Example 6-4 demonstrates the self-aligning nature of execution on the 55Р. 
The pipeline aligns instruction groups based on the positions of the critical op- 
erations in the code (the memory references, floating-point operations 
(FPops), and branches). 


The important consideration is that the critical operations do not become serial 
but rather are interleaved to increase parallel operation. Most sequences of 
instructions have many optimal schedules. 


6.5.1 Reduction of Branches 


As shown in Equation 6-1 and Equation 6-2, each branch tends to increase 
the execution time of a sequence by about two cycles. Thus it is more signifi- 
cant to remove a branch than to remove a memory reference. 

Code should be unrolled wherever possible. This is a performance boost on 
most machines, especially on the SSP. The addition of as many as four other 
instructions is generally better than a single branch. Where possible, arithme- 
tic logic, rather than sequences of branches, will improve performance. 


6.5.2 Allocation of Delay Instructions 


Sincethe delay instruction of a taken branch is forced to be a single instruction 
group, itis very importantto make the bestuse of that instruction. For example, 
in unrolled LINPACK, optimal performance cannot be achieved unless a 
memory reference is placed in the delay instruction. 
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For maximum performance, the delay instruction should be used to execute 
one of the critical performance-limiting operations in the code. 


Reorganizing code to properly fill the delay instruction may require adding in- 
structions to the code. This is a trade-off that should be made carefully. Even 
though adding instructions can increase performance on the SSP, it will de- 
crease performance on any non-superscalar machine. In addition, it expands 
the size of the program, which may increase the number of instruction cache 
misses. One alternative is to use software pipelining, which often allows the 
same performance levels without increasing the number of instructions. This 
is accomplished by spreading the execution of each loop iteration over several 
actual trips through the code. The resulting code in one loop would execute 
the final operations of the previous loop, the core of the current loop, and the 
prologue of the next loop. 


Annulled branches should be used only if there is no alternative. An annulled 
branch saves only the code space used for the delay instruction. The cycle in 
which the delay instruction of the annulled branch would have been executed 
is still required. 


6.5.3 Reduction of Floating-Point Register Dependencies 


Floating-point register dependencies can be significant performance limiters.. 
No floating-point register can be modified until all pending operations in the 
floating-point queue that either produce or use that register have completed. 
The Floating-Point Unit (FPU) has a fairly long execution pipeline, and perfor- 
mance sutiers when there are frequent FP data dependencies. The SSP's per- 
formance may be increased by using more loads and stores to move floating- 
point data back and forth between memory and registers, as opposed to over- 
using a single register and introducing dependencies. 


Note that these guidelines do not apply to integer operations. Integer opera- 
tions have much shorter latency, and integer register dependencies are han- 
died more effectively. Example 6-5 and Example 6-6 show bad and good ex- 
amples, respectively, of FP register dependencies. 
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Example 6—5.Bad Example of FP Register Dependencies 


ld {%i0],%f£1 

!~-- split (Only one memory reference) 
ld {210 + 4],%£0 

fmuls %#0,%21,%#0 

!--- split (Only one memory reference) 
та [510 + 3],%21 


fadds %f0,%f1,%f2 

1--- split 

!--- PIPELINE STALL FOR 2 CYCLES! 
1--- until fmuls produces %Е0 
!--- finishes with $f1 





Notice that the last load operation reuses register %1. This prevents the load 
operation from being executed until after the FMULS has completed. The main 
pipeline will stall for two cycles. A better schedule is shown in Example 6-6. 


Example 6–6. Good Example of FP Register Dependencies 


ld ($i10],£f1 

{--- split (Only one memory reference) 
ld [410 + 4],%£0 

fmuls %f0,%f£1,%f0 


!--- split (Only one memory reference) 
la [510 + 8],5£3 

fadds %f0,%£3,%f2 

|--- split 





By changing the register used in the last load and add operations, the two pipe- 
line stalls are removed completely. 


The FADDS uses 9610 as a source operand and can only start execution when 
the FMULS has produced a result for %10. The FADDS is issued into the FQ, 
however, and the main pipeline does not stall. Another optimization that may 
be applied to this code is using a load double to replace the first two memory 
references. Such optimizations must be done carefully. 


6.5.4 Other Floating-Point Code Issues 
The following section provides examples of floating-point codes that illustrate 


the SSP's FPU operation. Each example is preceded by a brief explanation 
about the significant details. 
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6.5.4.1 Throughput 


Example 6-7 shows how to take full advantage of FPop latency (for example, 
three-cycle latency of FADDs) and execute each FPop without any pipeline 
stall. This arrangement achieves the highest throughput. 


Example 6—7. Throughput Example 


6.5.4.2 One Stall 





fadds %f0,%f£1,%£20 
I--- split 
fadds %f£2,%f3,%f21 
I--- split 
fadds %#4,%Е5,%Е22 


I--- split 

fadds $f6,*$f7,1f23 
st %Е20, [511] 
!--- split -—------- 
fadds %f£8,%£9,%f24 


Example 6-8 is similar to the previous code sequence, except that the ST is 
attempted & cycle earlier. Because of the three-cycle required latency of 
FADDS, a bubble will be inserted to stall the pipeline before the ST can be com- 
pleted. The SSP's grouping logic places FADDS 9614,9615,96122 together with 
ST %f21, [%!1] in the same group (this is done prior to the FPU detecting any 
dependency). Therefore, when ST is stalled, the group remains the same. 


Example 6—8.One-Stall Example 
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fadds £f0,*f1,*5f20 

:--- split 

fadds %f2,%f3,%f21 

=== split 

fadds %£4,%£5,%£22 (issued) 


st ££21, [311] (issued) 
{=== Split 

stall 

!--- split 

' (fadds $f£4,%£5,%£22 completed) 
! (st £21, [511] completed) 
!--- split 
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6.5.4.3 FCMP/FBfcc Latency 


Example 6-9 demonstrates the latency of an FCMP-FBícc pair. An FCMP re- 
quires three cycles before the floating-point condition codes are resolved for 
an FBfcc use. Arranging the code as given in Example 6-9 allows each cycle 
to complete without any pipeline stall. 


Example 6—9.FCmp/FBfcc Latency 


!--- split 

fcmps £f6,2f7 

!-- split 

fadds %#0,%Е1, %Е21 
}-— split 

fadds $£$f2,$f3,2f22 


!--- split 

fadds %f4,%f5, £23 
}--- split --------- 
fbne +1 

{=== Split 





6.5.4.4 FCMP/FBfcc Stall 


The FPU detects whether there is a pending FCMP and stalls the pipeline if 
it encounters an FBfcc that may need ће FCMP's result. In Example 6-10, an 
FBfcc is issued immediately after an FCMP. The SSP issues the FCMP but 
then, before executing the FBNE, it stalls the pipeline for three cycles until the 
FCMP has completed. 


Example 6-10. ЕСтр/ЕВісс Stall 


{=~ split 
fomps %£6,%£7 
l--- split 
8tall 

!--- split 
stall 


}-— split 

stall 

!(fomps %ЁЕ6,%Е7 completed) 
!--- split 

fbne tl 

!——- Split 
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6.5.4.5 Register Dependency on Store 


Example 6-11 shows how a floating-point register dependency with a store 
may stall the pipeline. The SSP groups the FADDS, ST, and CMP instructions 
together and then begins execution. The FADDS instruction requires three 
cycles to compute results, thus stalling the integer pipeline for three cycles be- 
fore these instructions can complete. 


Example 6–11. f Register Dependency on Store 


)|--——- split --------- 
fadds %£f0,%f1,%£21 (issued) 
st $£21, [511] (issued) 
cmp 517,2 (issued) 
1--- split 

stall 

|--- split 


stall 

!--~ split 

stall 

! (fadds $f0,$f1,$f21 completed) 
! (st 5121, [211] compieted) 


! (œp completed) 
!|--—- split 





Note in Example 6-11 that a floating-point exception from FADDS will be re- 


ported to the ST. fp. exception is a deferred trap and is always reported to the 
next FPop or FPev. 


6.5.4.6 f Register Dependency on Load 


Example 6-12 shows how a floating-point register dependency before a load 
may stall the pipe. The SSP groups the FADDS, LD, and the CMP, and then 
begins execution. The LD cannot complete before the FADDS instruction is 
completed, which takes three cycles; the pipeline is therefore stalled for that 


amount of time. This is required to avoid destroying the source registers for 
current FPops. 
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Example 6—12. f Register Dependency on Load 


!-—- split 

fadds %f0,%f21,%f£1 (issued) 
id [311], $£21 (issued) 
стр 517,2 (issued) 
!--- split 

stall 

1-—- split 


stall 

1--- split 

stall 

| (fadds %£0,%£21,%£1 completed) 
! (ld (511],$f21 completed) 
! (а completed) 
!--- split 





Note that any floating-point exception from FADDS will be reported to the LD 
in Example 6-12. 


6.5.4.7 f Register Data Forwarding (Output to Input) 


Example 6-13 demonstrates how a floating-point register dependency 
between two FPops activates the data forwarding and does not cause the 
pipeline to stall. The destination register of FADDS %f0, 96f1, 96121 is a source 
register for FADDS 96f21, %f2, 96122 (both FPops are able to enter the FQ im- 
mediately), but the second FADDS will not start until the first FADDS reaches 
FWB stage. Since the second FADDS receives the data from the forwarding 
path, it does not wait until the first FADDS has written the data into the register. 


Example 6—13. f Register Data Forwarding (Output to Input) 


{~-— split 
fadds %f0,%f1,%f21 
\--- split 


fadds %f21,%f2,%£22 
!--- split 





6.5.4. f Register Data Forwarding (Input to Output) 


Example 6-14 demonstrates that a floating-point register dependency be- 
tween two FPops does not cause either the integer or floating-point pipeline 
to stall. Both instructions are able to complete immediately. 
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Example 6—14. f Register Data Forwarding (Input to Output) 


|--- 
fadds $f0,5f1,*f21 
t~-- split 


fadds %Е2,%Е3, 570 
i--- split 





6.5.5 Spread Address Calculation and Memory Reference 


In order for the SSP to implement single-cycle memory references, the ad- 
dress registers must be stable by the D2 pipeline stage of the memory refer- 
ence. This implies that a result for a register used for address computation 
mustbe completed no later than the EO stage of the previous instruction group. 


Example 6-15 illustrates that, since the results of the shifts are needed for the 
load address calculation, they must be executed in separate groups. If the shift 
had not been producing data for that load, it could have been grouped together 
with the load. With the address сајсшаноп dependency, however, the shift will 
be grouped with previous instructions, and the load will be grouped with subse- 
quent operations. 


Example 6-15. Address Calculation Dependency on Cascade 


811 %10,0х3,%10 


|--- split (ALUOP into Memory reference Address) 
ld (%00+%10],%11 





The next case, as shown in Example 6-16, is less common but occurs when 
the address is calculated in a cascaded instruction group.The results of a cas- 
cade are not available until the end of the E1 execution stage. Since the 
memory reference requires data at the end of EO, it must include a pipeline 
bubble. 


Example 6—16. Spread Address Calculation and Memory Reference 


811 510,0х3, 310 
ада &10,%00,%10 
|--- split (Cascade into memory reference) 


bubble 


'--- 
` 


la [310], 311 
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Note that Example 6-16 requires three cycles to execute rather than the two 
cycles required in Example 6-15. A sequence such as this is used rarely— 
only when complex address arithmetic is required. 


A third case often occurs during linked list traversal and is illustrated in 
Example 6-17. A bubble is inserted when the results of one load are immedi- 
ately used as part of the address calculation for the next load. 


Example 6~17. Indirect Address Calculation 


та {%10],%11 
!--- split (load into memory reference) 


bubble 
1--- split 
ld [$11], $12 





The pipeline stall is required for the same reason as in Example 6-16—the re- 
sults of a load instruction are not available until the end of the E1 pipeline 
stage. 


If any of the cases in Example 6-15, Example 6-16, and Example 6-17 can 
be spread apart by moving other instructions between these dependencies, 
performance will be increased. | 
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6.6 Instruction Grouping 


The SSP forms groups of instructions in the DO pipeline stage by examining 
the available instructions from its instruction queue. А set of grouping rules 
decides which of the instructions will be selected for inclusion in the next group 
to be executed. The SSP will select no instructions, the first instruction, the first 
two instructions, or the first three instructions to form a group. 


Thesize of the group is determined not only by the instructions of the program 
but also by the instructions actually available in the instruction queue. The 
instruction queue may contain fewer than three instructions due to branches 
and instruction cache misses. 

When no instructions are available in a cycle, the grouping logic issues a zero- 
instruction group. Zero-instruction groups are called bubbles. Bubbles tra- 
verse the pipeline like other groups but do not execute instructions. A bubble 
cannot cause a pipeline hold but is subject to pipeline holds from other groups 
in execution. 

There are two classes of grouping rules. The classes are the split after rules 
and the split before rules. These rules determine whether the instruction group 
will be terminated before or after a particular instruction. These two classes 
contain rules based on the available instructions, rules based on the previous 
instruction group, and rules based on exceptions. 


The following sections provide descriptions of all these rules. 


There are many grouping rules that are required to handle pipeline hold condi- 
tions. These rules are not listed here. 


6.6.1 Split After Rules 


The following rules split the group after an instruction based on relations 
among the first three instructions in the queue. These rules will prevent any 
further instructions from being included in the current group. 


Split after first valid exception. 

Split after any control transfer instruction. 

Split after condition codes set in cascade. 

Split after MULScc destination not equal to source of next MULScc. 
Split after first instruction after annulled branch. 

Split after first instruction midway through a branch couple. 


Ooooco 


6.6.1.1 Split After First Valid Exception 
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The group is split after the first exception; this prevents instructions from enter- 
ing the pipeline after an instruction access, exception has been signaled. The 
exception will travel through the pipeline and actually only occur when the 
instruction generating the exception reaches the EO stage of the pipeline. 
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6.6.1.2 Split After Any Control Transfer instruction 


This rule splits the current group between any branch and the delay instruction 
that follows the branch. Any instructions that were grouped along with a branch 
appeared before it in the program. 


6.6.1.3 Split After Condition Codes Set in Cascade 
This rule prevents any additional instructions from being accepted after an 


ALUop cascade in which the second ALU operation sets condition codes. See 
Example 6-18. 


Example 6–18. Split After Condition Codes Set in Cascade 


add %15,%13, %92 
subcc %g2,%g0,%q0 


!--- split after cc set in cascade 
bge loop 





6.6.1.4 Split After MULScc Destination Not Equal to Source of Next MULScc 


This rule prevents multiple MULScc instructions from being executed in paral- 
lel unless the second instruction uses the result of the first instruction. The nor- 
mal usage of MULScc has the destination equal to the source of the next 
MULScc. Other usage is uncommon, but it is architecturally legal. 


6.6.1.5 Split After First instruction After Annulled Branch 


This rule prevents multiple instructions from executing in the delay group of an 
annulled branch. The instruction in the delay group of an annulled branch will 
normaily be executed only if the branch is taken. When this occurs, only the 
first instruction after the branch is expected to be executed. If the previous 
branch is untaken, the instruction in the delay position is not executed at all. 
The SSP will begin executing this delay instruction, assuming the branch will 
be taken, and, if necessary, abort it later. 


6.6.1.5 Split After First Instruction Midway Through a Branch Couple 


This rule prevents multiple instructions from being executed when a branch 
couple is being processed. SPARC allows for a restricted number of branch 
couple conditions. The SSP will only execute single instructions during branch 
couples. Example 6-19 demonstrates this rule. The BE is executed in the 
delay siot for the BA and assuming the BE succeeds only the first instruction 
at dest! should be executed before continuing execution at dest2. 
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Example 6—19. Split After First Instruction Midway Through a Branch Couple 


ba destl 

!--- split after CTI 

be dest2 !--- delay instruction 
!--- split after CTI 

(Never Executed) 


add #10, %11,%12 
!--- split midway through branch couple 
add $13,%14,%15 


dest2: add $14, $15, $16 





6.6.2 Split Before Rules 


Split before rules are typically used to split a group when a resource required 
by the candidate instruction is unavailable. For example, all instruction groups 
are split before a second memory reference. 


The "Split Before" rules are as follows: 
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Split before invalid instruction queue entry. 

Split before out of integer register read ports. 

Split before out of integer register write ports. 

Split before second memory reference. 

Split before second shift. 

Split before second Fpop. 

Split before second cascade. 

Split before cascade into shift. 

Split before cascade into JMPL. 

Split before cascade into memory reference address. 
Split before load data cascade use. 

Split before cascade in previous group into memory reference address. 


Split before sequential instruction. 
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6.6.2.1 


6.6.2.2 


6.6.2.3 


6.6.2.4 


6.6.2.5 


6.6.2.6 


6.6.2.7 
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[1 Split before control register read after previous Setcc. 

(0 Split before MULSCC unless first one or two instructions. 

( Split before extended arithmetic from cc set in current group. 
С] Split before delay group CTI unless first. 

Г] Split before CTI іп JMPL delay unless RETT. 


Split Before invalid instruction Queue Entry 


The SSP uses this rule to wait for instructions from the instruction queue and 
instruction cache. Instruction access exceptions are considered valid instruc- 
tion queue entries so that they can proceed down the pipeline and be recog- 
nized. Itis possible for fewer than three instructions to be valid from the queue. 
This rule limits the maximum number that can be executed at a given time to 
the instructions available in the instruction queue. 

Split Before Out of Integer Register Read Ports 


This rule prevents a group from using more register file read ports than are 
available. Four operand read ports are available. Memory address register 
ports are independent and do not affect this rule. 


Split Before Out of integer Register Write Ports 


This rule prevents a group from using more register file write ports, condition 
code ports, or result forwarding resources than are available. There are two 
operand write ports. 


Each ALUop (including SETHI and instructions sending a result to 9990) uses 
one write port. Load instructions up to word size use one write port. Load 
double integer (LDD) instructions use both write ports. 

Split Before Second Memory Reference 


The SSP has only a single port to memory. This rule prevents a group from 
attempting to use it twice and does not discriminate between floating-point or 
integer load and stores. 


Split Before Second Shift 

Although the SSP has several ALUs, only a single shifter is provided. 
Split Before Second FPop 

Only one FPop may be issued to the FPU in an instruction group. 


Split Before Second Cascade 


Only one operand may be cascaded into the cascade ALU in step E1 of the 
integer pipeline. This rule prevents two EO results from being used to form a 
cascaded operation. 
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6.6.2.8 Split Before Cascade Into Shift 


The shifter must be used in the first execute stage and therefore may not be 
the second operation in a cascade. 


6.6.2.9 Split Before Cascade Into JMPL 


JMPL reads from the register file to calculate the target of its branch. The value 
is required before the EO stage. This requires that the register not be changed 
in the group with the JMPL. This rule detects the existence of such a condition 
and forces a split before the JMPL. 


6.6.2.10 Split Before Cascade Into Memory Reference Address 


Theregister values used for address calculation for memory reference instruc- 
tions may not be produced in the same group as the memory reference, as 
shown in Example 6-20. In order to perform single-cycle memory references, 
tho registers forming the address for a load or store must be available before 
the D2 pipeline stage of the memory reference. This requires that they be 
changed no later than the EO pipeline stage of the previous instruction group. 
This rule and the Split Before Previous Cascade into Memory Reference Ad- 
dress rule enforce this restriction. 


Example 6-20. Split Before Cascade into Memory Reference Address 


add £15, 213, %g2 
!--- split before cascade into mem ref 


ld [%92+0х10], #91 





6.6.2.11 Split Before Load Data Cascade Use 


A full cycle is required to access the on-chip data cache. The data from a load 
is available after the E1 stage of the load. Therefore, it may not be used before 
the EO stage of the next instruction group. This rule prevents an instruction 
from using that data in the group with the load instruction (during EO or E1). 


6.6.2.12 Split Before Previous Cascade Into Memory Reference Address 


Theregisters used for address calculations for memory reference instructions 
must not have been generated in the cascade of the previous group. This rule 
enforces an extension of the Split Before Cascade into Memory Reference Ad- 
dress rule. It requires that the address registers stabilize by the end of the pre- 
vious instruction group's EO pipeline stage. In Example 6-21 the SUBcc and 
the LD would have been grouped together but will be split by this rule. 
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Example 6-21. Split Before Previous Cascade Into Memory Reference Address 


add $15, %13, 292 
add $g2, %12, %94 
!--- split (out of write ports) 


subcc %94, 0x10, %g0 
!--- split before cascade in prev group 
ld [%94+0х10], $g1 





6.6.2.13 Split Before Sequential instruction 


The SSP has a set of instructions that can only be executed as single instruc- 
tion groups. In addition, the SSP can be forced to execute all instructions as 
single instruction groups through an ASI visible control bit (ACTION.MIX). See 
Subsection 15.2.5. 


The set of instructions that are always single instruction groups is as follows: 
[] SAVE and RESTORE. 

С] LDD and STD operations in integer registers. 

(ј Al aitemate space stores: STA, STDA, STBA, STHA. 

[] Atomic operations: SWAP, LDSTUB. 

Г] All control register accesses: read and write PSR, ASR, WIM, Y. 
О RET. 

О FLUSH. 

C] All software traps: Tice. 

(0 Integer multiply: UMUL, UMULcc, SMUL, SMULcc. 

С Integer divide: UDIV, UDIVcc, SDIV, SDIVcc. 

Q Tagged operations that can trap: TADDccTV, TSUBccTV. 

(О Load or store FP status registers: LDFSR, STRSR, STDFQ. 

О Branch on floating-point condition code: FBfcc. 


6.6.2.14 Split Before Control Register Read After Previous SetCC 


Integer condition codes are part of the processor status register (PSR). To pre- 
vent them from being read while they are being modified, the processor forces : 
an extra cycle between RDPSR instructions and a previous arithmetic opera- 
tion that may have changed the condition codes. 
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6.6.2.15 Split Before MULSCC Unless First One or Two Instructions 


The SSP executes MULScc as a single instruction group, or it executes two 
MULScc instructions as a group. MULScc is never grouped with any other in- 
structions. 


6.6.2.16 Split Before Extended Arithmetic From CC Set т Current Group 


The SSP cannot use condition codes calculated in the EO pipeline stage as 
input to extended precision arithmetic in the same group. This rule inserts a 
split between the condition code computation and its use in extended arithme- 
tic (ADDX ADDXcc SUBX SUBXcc). 


6.6.2.17 Split Before Delay Group CTI Unless First 


If the group is a delay group, this rule splits the group before the second 
instruction. This rule prevents additional branches, other than true branch 
couples, from being executed in the delay group of a branch. 


6.6.2.18 Split Before CTI in JMPL Delay Unless RETT 


This rule allows an optimization of JMPL/RETT pairs and simplifies JMPL 


couples. A full cycle is required in the pipeline between a JMPL and any other 
CTI except RETT. 
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This chapter describes the operation of SuperSPARC instructions that require 
a more detailed description than is already provided in The SPARC Architec- 
ture Manual. 
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7.1 
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Integer Multiply (IMUL) 


SuperSPARC implements integer multiply in all its forms (SMUL, SMUL,,, 
UMUL, and UMUL,,), conforming to the SPARC architecture specification. It 
is mentioned here only because the instruction is new to the Version 8 SPARC 
architecture. 


Integer multiply is implemented in the floating-point unit (FPU) of the proces- 
sor. Normally, these operations will wait until the completion of any pending 
floating-point operations (FPops) before execution (indicated by FP Queue 
(FQ) empty). If the FPU is in exception mode or exception-pending mode (see 
Subsection 11.1.9), integer multiply operations will proceed without waiting for 
the FQ to empty. Integer multiply will not cause any deferred floating-point ex- 
ceptions to be signalled. 
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7.2 Integer Divide (IDIV) 


For most numeric conditions, integer divide instructions (SDIV, SDIV,,, UDIV, 
UDIV,,) work according to the SPARC architecture specification. Due to limita- 
tions in the hardware, certain numeric cases cannot be completed. In these 
cases, SuperSPARC will signal an illegal instruction trap (trap type 0x02). 
This occurs when the 64-bit value comprised of {Y,rs1} has numerically signifi- 
cant bits beyond bit 51. In effect, SuperSPARC only implements a 52-bit by 
32-bit integer divide, compared to the 64-bit by 32-bit specification. This holds 
true for both positive and negative numbers when the operation is a signed di- 
vide. | 


System software is expected to emulate these integer divide operations when 
required. 


Integer divide is implemented in the FPU of the processor. These operations 
will normally wait until the completion of any pending FPops before execution. 
if the FPU is in exception mode or exception-pending mode (see Subsection 
11.1.9), integer divide operations will proceed without waiting for the FQ to 
empty. Integer divide will not cause any deferred floating-point exceptions to 
be signalled. 
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7.3 Write PSR (WRPSR) 


7-4 


SuperSPARC implements the write PSR instruction according to The SPARC 
Architecture Manual, with one qualification. Since SuperSPARC provides no 
coprocessor port, system software is prevented from trying to enable copro- 
cessor operations. 


If a WRPSR instruction attempts to set the PSR.EC bit, an illegal instruction 
trapwill be generated immediately. Sincethe EC bit can never be set, all copro- 
cessor instructions will generate cp, disabled traps (trap type 0x24). 


The timing requirements of PSR write operations match The SPARC Architec- 
ture Manual exactly. A three-instruction (not cycle) delay is required between 
changing any PSR fields and using the contents. Note also that, when a JMPL/ 
RETT instruction pairis seen, SuperSPARC will use the PSR.PS (previous su- 
pervisor) bit to check protections for the instruction fetch of the JMPL's target. 


The PSR.IMPL (implementation number) field of the PSR is 0x40 and cannot 
be changed. 
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7.4 Flush (IFLUSH) 


SuperSPARC implements FLUSH slightly differently from how The SPARC Ar- 
chitecture Manualsuggests. No cached information is explicitly flushed by the 
instruction. The cache-consistency mechanisms are used to ensure that 
caches always have correct data (both instruction and data caches). The 
FLUSH operations simply cause an exact synchronization of all pending activ- 
ity. 


When a FLUSH instruction is executed, it will cause SuperSPARC's store buff- 
er to be drained. This causes all pending bus activity to complete. Any cache- 
coherency transactions (for instance, invalidation of instruction cache entries) 
will occur as the store buffer clears. The intemal processor pipeline and in- 
struction buffer will also be cleared. When the pipeline resumes execution after 
the FLUSH, it will fetch data from the instruction cache, which is guaranteed 
to be up to date with respect to all prior processor activity. 


— ER | 


Note: 


The FLUSH instruction affects only a single processor. No other processors 
in a system will do a flush operation unless explicitly requested to do so (by 
their own FLUSH). This does not matter for most application programs. It is 
significant, however, to such programs as dynamic linkers. These programs 
must be written carefully to ensure that all processors see a consistent view 
of program memory under all circumstances. This may require modifying 
memory in a certain order and/or writing intermediate values to guard against 
temporary inconsistencies. Note that cache coherence is maintained on 
double words. 
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7.5 Store Barrier (STBAR) 
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The STBAR instruction forces all store operations before it to complete before 
any store operations after it are performed. STBAR is needed only in partial 
store ordering (PSO) mode. SuperSPARC impiements this functionality as de- 
scribed in The SPARC Architecture Manual. 


The instruction is implemented in the "HDASR" reserved instruction space. 


The opcode is equivalent to RDASR 0х01, 9590; the exact encoding is 
0x8143c000. 


The STBAR instruction sets the SBTAGS.SP in the last allocated (valid) entry. 
If no entry is allocated, the STBAR instruction is remembered in the bus unit 
arbitration logic. When the bit is set, the bus unit waits for the PEND_ input to 
become inactive before issuing any stores after the one with the bit set. With- 
outthe STBAR, the SBTAGS.SP bit will not be set, and the b_unit will continue 
issuing stores as fast as the extemal cache controller can buffer them. This 
would allow stores to be performed out of order in the system. 


All this happens only in PSO mode. In TSO mode, the bus unit waits for PEND 
at all times. 


See Chapter 8 and especially Section 8.7 for more information. 
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7.6 Signal User Emulation Request 


SuperSPARC implements a special SuperSPARC-specific instruction called 
Signal Emulation (SIGM), that is used in conjunction with the on-chip JTAG- 
based emulation facilities provided by SuperSPARC. 


The instruction is implemented in the "HDASR" implementation-dependent 
extended opcode space defined by the SPARC architecture. The opcode is 
equivalent to RDASR Ox1f, 9690, which encodes to 0x8147c000. 


The operation of this instruction is dependent on the state of the JTAG con- 
trolled MCMD register. If the MCMD.INITM bit is cleared, the SIGM instruction 
will be executed as a NOP. If the MCMD.INITM bit is set, execution of SIGM 
will cause immediate entry into emulation mode. (See Section 22.1 for details.) 


The MCMD.INITM is always initialized to zero at JTAG Tap controller reset. In 
order for the SIGM instruction to cause entry into emulation mode, the bit must 
be explicitly set by a remote emulation processor using JTAG scan. The 
MCMD.INITM bit cannot be set except through JTAG scan. 
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This chapter describes the programmer's view of memory in a SuperSPARC- 
based system. The exact view of memory is highly dependent on system im- 
plementation; some of the details below may not apply to certain system envi- 
ronments. 


Topic Page 
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8.1 Three Modeis of Memory: SO, TSO, PSO 


The SPARC Architecture defines three views of memory: Strong Ordering 
(SO), Total Store Ordering (TSO), and Partial Store Ordering (PSO). Refer to 
The SPARC Architecture Manualfor a comprehensive discussion of the mod- 
els. 


The memory operation of a system depends not only on the processor but also 
on: 


С Bus or Interconnect, 
С] Caches (second, third, and greater levels), and 
(С Memory Organization (Banking and Interleaving). 


Following is how the SuperSPARC processor supports the three memory 
models. 


8.4.14 Strong Ordering (SO) 


Strong ordering allows for maximum software compatibility. Of the three mod- 
els, SO has the lowest system performance and may have the highest system 
complexity. In this memory model, all transactions are seen in the single global 
order in which they were issued by all processors, caches, and memories in 
the system. It allows for a very simple and intuitive programmer's model. 


If the system supports strong ordering, SuperSPARC can implement strong 
ordering by disabling the internal store buffer (set MCNTL.SB « 0). This will 
cause decreased periormance in most system environments, particularly 
when using SuperSPARC with the MultiCache Controller (MXCC), since all 
store operations write through to extemal caches. 


8.1.2 Total Store Ordcring (TSO) 


Total store ordering is similar to strong ordering, and there is a single global 
order across all processors of store operations. Furthermore, each proces- 
sor's store operations always occur in program order. In general, there is a 
giobal order for loads. This memory model allows most multi-threaded applica- 
tions to operate with good performance. 


TSO is the normal memory model of SuperSPARC-based systems. TSO is en- 
abled by setting the MCNTL.SB (enabling the store buffer) and clearing 
MCNTL.PSO (disabling partial store ordering). 


8.1.3 Partial Store Ordering (PSO) 


Partial store ordering is the highest performing of the memory models. It re- 
leases stores from the implied ordering of the program, and requires software 
to explicitly mark where store ordering is needed. This model requires the most 
careful use of memory by applications. 
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The STBAR (store barrier) instruction wili guarantee the order of store opera- 
tions before and after it. All stores issued before а STBAR will complete before 
any store instruction issued after. To achieve the equivalent of the TSO model, 
an STBAR might need to be inserted prior to every store operation. 


PSOis enabled by setting both the MCNTL.PSO and MCNTL.SB bits to 1. Su- 
perSPARC implements PSO mode in cooperation with extemal cache and 
memory controllers. The PEND signal is sampled in order to determine the 
completion of store transactions by the system. See Section 8.7. 
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8.2 Atomic Operations: SWAPs and LDSTUB 


SWAP and LDSTUB instructions are used to implement semaphores and oth- 
er atomic operations in memory. 


8.21 Atomic Operations in the Store Buffer 


Atomic operations force the store buffer to copy out its contents to main 
memory (store buffer copy-out). 


If an exception occurs during the store buffer copy-out caused by an atomic 
operation, the operation is not completed. Instead, a data store error is tak- 
en, which immediately disables the store buffer. The atomic operation may be 
restarted when the CPU returns from the store buffer exception handler (see 
Subsection 10.6.5). 


Ifthe atomicoperation itself encounters an exception on either the write or read 
access, adata access exception will be reported. SuperSPARC guarantees 
that the destination register will not be updated. The system is responsible for 
ensuring that the destination memory location, as in any store exception, is not 
modified. 


8.2.2 Atomic Operations for SuperSPARC Used With the MultiCache Controller 


Atomic operations have traditionally been implemented as a locked sequence 
of loads and stores (Read-Modify-Write). In order to support higher-perfor- 
mance packet-switched buses, SuperSPARC implements a true swap opera- 
tion—it supplies the new data to be written along with the request for the cur- 
rent memory data. This is done to ensure that SuperSPARC receives the cur- 
rent value of the old data. The system or external cache logic must be capable 
of accepting this transaction. This is made possible by the definition of 
SPARC's atomic operators—the data to be written is independent of what is 
read. 


8.2.3 Atomic Operations for SuperSPARC Directly on MBus 
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Since the SuperSPARC processor connected directly to MBus operates with 
8 copy-back, write-allocate cache protocol, the operation of atomic transac- 
tions is simpler than when the processor is used with MXCC. 


For cacheable references in MBus mode, no special bus operations are done 
for atomic transactions. They are implemented as a simple sequence of reads 
and writes. SuperSPARC will read and acquire ownership of the data being ref- 
erenced, and all operations will occur within the intemal cache. tf the data is 
shared, a Coherent invalidate (Cl) bus transaction will be issued to acquire 
ownership. 


Non-cacheable references for SuperSPARC connected directly to MBus oper- 
ate more traditionally. They will appear on the bus as a locked read-write se- 
quence. The LOCK bit within the MBus address field will be set, and bus arbi- 
tration will not be released between the read and write. 
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8.2.4 Alternate Space Atomics 


Swap alternates to ASI locations other than 0x08-0x0b and 0х20-0х21 will 
cause data access exceptions. Alternate atomic operations to ASis 
0x08-0x0b and 0x20-0x2f are handled the same as ordinary atomic opera- 
tions. 
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8.3 Load and Store Alternates 


Loads and stores to altemate address spaces are generally performed for low- 
level control of SuperSPARC and the extemal system. Appendix B contains 
a summary list of valid ASIs. 


Nearly all AS! operations (both LDA and STA) cause a store buffer copy-out 
before starting execution. Exceptions are listed in Table 8-1. 


Table 8—1. ASI Operations That do NOT Cause Store Buffer Copy-Out 
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For example, context register writes cause the store buffer to copy-out, pre- 
venting the store buffer from containing operations that belong to contexts oth- 
er than the current one. This allows data store exceptions to be associated 
with the faulting process more easily. 





If an exception occurs during the store buffer copy-out caused by an LDA/STA, 
the operation is not completed. A data store exception is taken, which 
immediately disables the store buffer. The STA operation may be restarted 
when the CPU retums from the store buffer trap handler. 


Alternate space transactions through ASIs Оха and Oxb are treated as normal 
load and store operations. ASIs 0x8 and Ox9 are treated as instruction ac- 
cesses. These accesses are translated by the MMU and may trigger a table 
walk. If the table walk encounters an invalid or reserved PTE or PTP, atrap will 
be taken. 


Transactions through the pass-through ASI space (0x20-0x2f) are treated as 
normal loads and stores, except that they are not translated by the MMU (see 


Section 9.9). For these operations, cacheability is determined by the 
MONTL.AC (altemate cacheable) bit. 
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8.4 Non-Cacheable Loads and Stores 


The cacheability of memory references is determined as described in Subsec- 
tions 10.2.2 and 10.4.3. 


Non-cacheable loads cause the contents of the store buffer to be copied out 
to memory before proceeding. This is done to ensure proper memory ordering 
of accesses to I/O space. 


If an exception occurs on the store buffer copy-out caused by a non-cacheable 
load, the load operation is not completed. A data_store_exception is taken, 
which immediately disables the store buffer. The load operation may be re- 
started when the CPU returns from the store buffer exception handler. 


Unlike loads, non-cacheable stores do not normally force the store buffer to 
copy out. They are simply placed in the store buffer, if enabled, like cacheable 
stores. 
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8.5 Page Table Memory Operations 


The Memory Management Unit (MMU) page tables in system memory should 
be accessed cautiously because there may be conflicts between software and 
hardware accesses. This section describes how to access the page tables. 


8.5.1 Hardware Use of Page Tables 


The MMU hardware autonomously reads the page tables to perform transla- 
tions and modifies them to keep Referenced bits (R) and Modified bits (M) up 
to date. 


When the SuperSPARC MMU table walk hardware accesses the page tables, 
itdoes so in a way that guarantees consistency between processors in a mutti- 
processor system. Table walks are not performed under locks, and hardware 
and software can interfere with each other if accesses do not follow the algo- 
rithm laid out in Subsection 8.5.3. 


Accesses by the MMU hardware to update the referenced and modified (R&M) 
bits in a page table entry are made using standard write or swap operations, 
and without bus locking. See Section 9.4 for more detailed information on R&M 
updates. 


8.5.2 System Software Use of Page Tables 


System software accesses page tables to check statistics and remap physical’ 
memory. 


When only system software accesses page tables in memory, consistency is 
guaranteed by the normal cache consistency mechanisms, as well as by such 
standard software access controls as semaphors. Unfortunately, Super- 
SPARC’s table walk hardware has no way to recognize these software-locking 
conventions. Subsection 8.5.3 describes how software can access the page 
tables without introducing inconsistencies. 


8.5.3 Hardware/Software Page Table Consistency 
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Since both hardware and software accesses to the page tables may be in prog- 
ress simultaneously, some algorithm must be used to guarantee that these two 
sources (as well as multiple instances of both of them due to multiple proces- 
sors) will not interfere with each other. 


Inconsistency can occur in several ways. The usual way is when software 
changes (e.g., invalidates, etc.) a page mapping and the table walk hardware 
has already read the table entry into the TLB. In this situation, it is the responsi- 
bility of the system software to perform an MMU flush (or Demap) operation 
to force the page table to be re-read by the MMU. See Subsection 9.8.2. This 
is sufficient in a uniprocessor environment. In a multiprocessor environment, 
the flush operation must be done on every processor in the system. 
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Note: 


Software must guarantee that only a single Demap operation is in progress 
at any one time across the entire system. Inconsistent operation will result 
iftwo Demaps are received by a processor at any one time (including internal 
Demap requests). 





in XBus systems the local flush operation is automatically broadcast to all pro- 
cessors. There is no need to interrupt remote processors to issue local flush 
transactions. In MBus systems all processors must be interrupted and told to 
do their own local flush operation. The broadcast Demap capability of XBus 
is recommended for large, higher-performance multiprocessor systems. 


A more difficult problem arises when the MMU hardware must rewrite a page 
table entry to setor clear the R or M bits. The hardware mustbe prevented from 
overwriting a modification that system software has just completed. 


A general algorithm that prevents inconsistency in both situations is presented 
in Example 8-1. The exact implementation of this code is system-dependent. 
The implementation of lock and unlock operations is memory model depen- 
dent; see The SPARC Architecture Manualfor sequences matching the three 
memory models. 


Example 8-1.Сепегайгед Safe Page Table Update Algorithm 


/* Acquire exclusive page table access */ 
Lock 

/*Any R*M updates will accumulate here*/ 
RM Accum - 0 

/*Set PTE to zero temporarily */ 

Loop: Reg = 0 

/*Write 0, read back current PTE */ 
Swap (TargetPTE, Reg) 

/*Flush ALL reference to this PTE in system*/ 
FlushAllMMUs (TargetPTE) 

/* Catch any late R+M Changes */ 
RM Accum = RM Accum OR Reg 

/* Continue until it's really zero */ 
if (TargetPTE «» 0) go to Loop 

/* Safe to write new value */ 
TargetPTE - NewPTE 

/* Release lock on page table access */ 
Unlock 


Operationally, the algorithm may take several iterations to complete. The Fiu- 
ShAIIMMUS operator is a system-dependent mechanism to execute a flush op- 
eration on all MMUs іп the system. System software should use the RM Ac-: 
cumvalue as the final value that was in the PTE entry before modification. This 
will guarantee that no page table status information is lost. 
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8.6 Prefetch Exception Handling 
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For most cache misses on load or instruction fetch operations, SuperSPARC 
performs a block read. Block reads will load four double-words from memory 
into the cache in a burst transaction. SuperSPARC will always request the 
word that it needs as the first word of the burst, then read the rest of the four 
double words addressed modulo four. A bus error may be encountered on any 
one of the double-word transfers. 


Demand fetches immediately return data that is required by the processor. | 
abus error occurs on a demand fetch, an exception is reported to the pipeline, 
causing an instruction access, exception to occur. 


Non-demand fetches are the remaining words in the burst, also called pre- 
fetches. If a bus error occurs on prefetch data, it will not be reported to the pipe- 
line. In this case, the entire cache line referenced by this transaction will be 
invalidated. The demand fetch will have been satisfied, but none of the addi- 
tional prefetch data is in the processor. If that data is required by the processor 
in the future, it will be fetched again, as a demand fetch. If the error at that loca- 
tion persists, it will then be reported to the pipeline as an instruction ac- 
cess, exception. 


Information about errors of this sort may be accumulated outside the proces- 
sor for repair or gathering statistics. If desired, external controliers may raise 
an interrupt to inform system software of the existence of these errors. System 
software may initiate attempts to eliminate the error at this point before the data 
is truly required (demapping pages, etc.). 
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8.7 Memory Model Support (PEND) 


The operation of SuperSPARO's store buffer and the way in which Super- 
SPARC uses PEND is dependent on the Memory Model selected. 


When the store buffer is on (MCNTL.SB = 1), SuperSPARC may be operated 
in TSO or PSO mode. When in TSO mode, SuperSPARC always waits for 
PEND to go high before issuing another store to either VBus or MBus. 


When in PSO mode, SuperSPARC waits for PEND to go high before issuing 
atransaction according to Table 8-2. Note that for some transactions, Super- 
SPARC waits only if there is a prior store barrier (STBAR) instruction. In PSO, 
STBAR becomes a barrier in the store buffer so that all stores issued before 
STBAR are completed before any stores after STBAR are sent from the store 
buffer to the bus. 


Table 8-2. VBus Transactions That Wait On PEND 
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In the above table the following notation is used: 


Wait Always wait for PEND to go high before begin 
ning the transaction. 
Only if prior STBAR Wait for PEND to go high if a previously issued 


STBAR is in the store buffer. 


Note that internal ASI operations (both loads and stores) will wait for deasser- 
tion of the PEND signal. There is one exception to this, ASI Ох4с (the “Action 
on event" register), which does not depend on PEND. 
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When SuperSPARC is used with the MXCC, none of the above changes. 
MXCC controls SuperSPARC's PEND pin. Since the external cache is copy- 
back, cacheable stores do not generate immediate stores to the bus. When 
ablockis replaced on an external cache miss, the new data is read before any 
dirty data in the replaced block is stored to the system bus. If the system bus 
is MBus, stores always occur in order because MBus is busy until the store 
completes. The XBus interface provides a packet interface, which can allow 
multiple pending stores to be issued while waiting for acknowledgements to 
complete. MXCC asserts PEND (= L) to SuperSPARC whenever any store has 
been sent to the system and has not been acknowledged. MXCC мл! issue 
multiple Non-Cacheable (NC) stores only if they are to the same page 
(presumably serialized by the device receiving the packets). MXCC will always 
force SuperSPARC to wait until the previous operation was completed if the 
CCCR.MC bit (multiple command enable) is clear, thus greatly limiting the 
opportunities for out-of-order execution by the system. 


In summary, while SuperSPARC supports systems that perform stores out of 
order, SuperSPARC always issues stores in order, and the presence and ex- 
tent of out-of-order stores is controlled by the system. The system must use 
the PEND pin to signal to SuperSPARC that there are outstanding stores. Su- 
perSPARC uses the PEND pin to ensure that the programmer's intentions in 
the current memory model are obeyed. When the MultiCache Controller is 
used, it controls the PEND pin. MXCC does not reorder stores, can only issue 
multiple stores in XBus mode, and generally does not issue NC stores out of 
order. 
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The SuperSPARC processor (SSP) implements a 64-entry fully-associative 
Memory Management Unit (MMU) compatible with the SPARC Reference 
MMU Specification. 


The ММО translates 32-bit virtual addresses into 36-bit physical addresses. 


The mapping is done in units of 4K-byte page, 256K-byte segment, 16M-byte 
region, or 4G-byte context. 
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9.1 Memory Management Unit Fundamentals 
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The MMU serves to perform a number of important functions in sophisticated 
computer systems: 


(С Addressing Contexts 


Addresses are unique to hardware contexts, corresponding roughly to 
software processes. 


О Relocation 


Processor-generated addresses are translated to main memory 
addresses. The translations are relocatable in small blocks called pages. 


(а Controlled Sharing 


Portions of the address spaces for pairs of contexts can be set up to be 
shared between them. The unit of sharing is a page. 


[) Protection 


Access can be confined to addressable memory. Each context can be 
individually allowed or disallowed read, write, and execute access to 
pages. 


С} Virtualization 


A software handler can process translation failures by locating the missing 
page on backing store, allocating main memory for it, copying it from back- 
ing store into main memory, setting up a translation for it to the newly 
allocated main memory, and retrying the access. 


(] Supervisory Software Support 


Special modes and protection modes support supervisory software 
access to other contexts. 


SuperSPARC contains a sophisticated MMU based on the SPARC Reference 
MMU in The SPARC Architecture Manual. This MMU performs translations by 
stepping through a tree-structured translation table in main memory called the 
page tables. When a translation is found, it is cached in a Translation 
Lookaside Buffer (TLB) in the MMU so that the memory accesses to the page 
tables can be avoided ín subsequent accesses. 


This chapter describes the page tables and the operation of the MMU and the 
TLB. 
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9.2 Address Translation 


This section briefly describes the software view of memory mapping using the 
SuperSPARC MMU. For background, refer to the SPARC Reference MMU 
Specification in The SPARC Architecture Manual. 


Physical addresses are composed of an offset within a page and a Physical 
Page Number (PPN), while virtual addresses are composed of an offset within 
a page and a Virtual Page Number (VPN). 


The MMU translates 32-bit virtual addresses and 16-bit context numbers into 
36-bit physical addresses by accessing up to four levels of page tables in 
memory. Normally, this translation is cached in the on-chip 64-entry TLB. 
When the translation entry is missing from the TLB, the MMU table walk hard- 
ware automatically retrieves the translation from the page tables in memory. 
Figure 9-1 describes the full structure of these page tables. 


Figure 9—1. Address Translation Utilizing Four Levels of Page Tables 


VIRTUAL ADDRESS 





Each virtual address space is identified by a context number that is kept in the 
context register. Virtual addresses kept in the TLB are tagged with a 16-bit con- 
text number. The effective size of the context register is variable between 10 
and 16 bits. This is so that page tables can be smaller in systems with less 
memory. 
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The page tables can contain page table pointers (PTP) or page table entries 
(PTE). APTE is distinguished from a PTP by the two low-order bits of the table 
entry (see Figure 9-2). A PTP contains the physical address of the next page 
table level, while a PTE contains the physical address of the page with its ac- 
cess rights. 


9.2.1 Page Table Entry 
Figure 9—2. Page Table Entry 


PPN Physical Page Number. This is the high-order 24 bits of the 36-bit 
physical address. If the PTE maps a 256K-byte segment, 
16M-byte region, or 4G-byte context, the lower 6, 12, or 20 bits, 
respectively, of the PPN are ignored. 


C Cacheable. If this bitis set to 1, the page is cacheable in the Su- 
perSPARC intemal (and external) caches. If it is zero, the page 
isnotcacheable. SuperSPARC asserts the CCHBL pin for trans- 
actions involving virtual addresses with the C bit set. 


M Modified. When a page is accessed for writing and the modified 
bit is not set, the MMU sets the modified bit (M) in both the TLB 
and the main memory page table entry. 


R Referenced. This bit is set to 1 by the hardware when the page 
is accessed (on a read or a write) and the PTE is missing from 
the TLB. As with M, both the TLB and the memory copies are set. 


ACC Access Permissions. This bit field is encoded as shown in 
Table 9-1. The MMU checks to determine whether an access is 
authorized according to the mode in which the instruction is ex- 
ecuted (user or supervisor) and the type of access (instruction 
or data reference). If there is a violation of the permissions, à 
data or instruction access exception is signalled. For more in- 
formation on access permissions, see MFSR.AT in Subsection 
9.12.3.7. 
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Table 9—1. Access Permission Codes 








ACC 0 1 2 3 4 5 









User 
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ЕТ Entry Type. This field is used to distinguish a PTE from а РТР 


and to indicate whether a table entry is valid. The encoding is as 
shown in Table 9-2; a valid PTE always has a ET equal to 2. 


Table 9—2. Entry Type Encoding 
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9.2.2 Page Table Pointer 


Figure 9-3. Page Table Pointer 


31 10 

РТР Page Table Pointer. Bits 2 through 31 contain the 30-bit page 
table pointer. This is the physical address of the base of a next- 
level page table. 

ET Entry Type. Distinguishes between a PTE and PTP. The ET field 


is encoded as in Table 9-2. In a PTP ET is always equal to 1. 





Note: 


The page tables must be aligned on boundaries equal to their sizes. Low-or- 
der bits of the PTP field must be 0. PTPs must point to tables aligned to their 


natural size. 
ия — —— —— M — | 
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9.3 Large Linear Mappings 


The MMU supports mapping sizes larger than the 4K-byte page size. This is 
done by configuring an entry in the context table, level 1 table, or level 2 table 
as a РТЕ (ЕТ=2). 


If a PTE is found in the context table, the virtual-to-physical mapping is done 
as indicated in Figure 9-4 for a 4G-byte context. 


Figure 9—4. Address Translation With Maximum Page Size 
Virtual Address 





Physical Address 


If a PTE is found in a Level 1 table, the virtual-to-physical mapping is done for 
а 16M-byte region, as shown in Figure 9-5. 
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Mappings 


Large Linear 


Figure 9-5. Address Translation With 16M-byte Page 


Virtual Address 










Contex Table 
Polnter Register 


35 23 0 


If a PTE is found in a level 2 table, the virtual-to-physical mapping is done for 
a 256K-byte segment, as shown in Figure 9-6. 


Figure 9—6. Address Translation With 256K-byte Page 


Virtual Address 


Physica! Address 


Offset in 256К-ђује Segment 
35 


17 0 


The TLB is fully associative, it matches all entries simultaneously. Each entry 
may be of any of the mapping sizes. 
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9.4 MMU-Referenced and Modified Bits 


MMU R&M Updates 


The referenced bit (R) of a PTE is set to 1 whenever its page is accessed by 
the MMU during cache miss processing or when an entire probe is initiated. 
If the Referenced bit (R) is already set, it is not set again. 


The Modified bit (M) is checked when the PTE is accessed as a part of the ex- 
ecution of a store instruction. ff the M bit is clear in the MMU entry, the M bit 
in both the TLB and the copy of the PTE in main memory are set to 1. 


The R and M bits must be updated in main memory in a manner that guaran- 
tees their consistency in a multiprocessor environment. For a discussion of 
possible algorithms, see Section 8.5. 


Writes to PTEs in main memory by the MMU are required by ће SPARC refer- 
ence MMU architecture to be synchronous; thus they blockthe execution pipe- 
line. They also force a store buffer copy-outto preserve the sequence of writes 
(see Section 10.6.) This copy-out is initiated when the memory reference is 
present at the EO stage of the execution pipeline. 


If an exception occurs on the store buffer copy-out caused by an R&M update, 
the R&M update operation is not completed. A data store exception is taken, 
which immediately disables the store buffer. The load, store, or fetch that 
caused the R&M update will be restarted when the CPU retums from the store 
buffer trap handler; this in tum will eventually restart the R&M update. 


The table walk hardware within SuperSPARC will use a standard write opera- 
tion when setting both А and M bits in memory. If only the А bit must be set, 
atomic memory transactions will be used to ensure that another processor is 
not simultaneously attempting to set both bits. If the processor did not use this 
protection, it would be possible to overwrite a PTE with both R and M bits set 
by another updated PTE with only the R bit set. This is prevented by using 
SWAP transactions to do R bit only updates. Note also that the only combina- 
tions that SuperSPARC will ever write back to the PTE are (R21,M«0) and 
{R=1,M=1}. 


Using SWAP for R-bit updates is essentially the only page table consistency 
algorithm implemented in hardware. All other cases of page table consistency 
must be implemented in software as described in Section 8.5. 
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9.5 TLB Replacement Policy 


The MMU has 64 TLB entries. When a new PTE entry is brought into the MMU 
on a TLB miss, it must be stored in the TLB. If one or more entries are invalid, 
the new translation will be stored in the lowest-numbered invalid entry. When 
all entries are valid, one of the valid entries must be selected for replacement 
by the new entry. SuperSPARC's ТІВ uses a limited-history LRU policy. 


Each TLB entry has a used bit associated with it. The used bit is set for any 
TLB entry that has a TLB hit. When all entries have their used bits set, all used 
bits (except the last one to be set and those that are locked) are cleared. This 
is the case when all past history is lost. To select an entry to be replaced when 
all TLB entries are valid, the lowest-numbered entry for which the used bit is 
not set will be chosen. Since a TLB hit causes the used bit to be set, this repre- 
sents the least recently used entry based on the limited history available. 


When an entry is invalidated or flushed from the TLB, its corresponding used 
bitis cleared. When a demap-all operation is done to invalidate all TLB entries, 
all used bits are cleared. In addition to the used bits, the replacement policy 
also checks the corresponding lock bit (one per TLB entry). If a lock bit for an 
entry is set, then, regardless of the used bits, that entry will never be replaced. 
An invalid entry with its lock bit set can still be replaced, and the newly written 
entry becomes locked, since the lock bit remains set. If all entries have their 
lock bits set, no replacement takes place, and the newly brought in PTE is not 
stored in the TLB. Locking all the entries in the TLB must be avoided, since a 
translation finishes only after a table-walk operation has completed, thus caus- 
ing an infinite table-walk loop. 


f—— MÁ———— M —ÀM 


Notes: 
Setting all lock bits in the TLB can lead to deadlock and is not recommended. 


Allowing the lock bit to be set for invalid entries can lead to inconsistent op- 
eration and is therefore not recommended. Since these entries remain 
locked after a table walk writes a new entry into them, that entry will remain 
in the TLB. This is true even after a demap operation, which should invalidate 
it. It is recommended that lock bits be set only in conjunction with explicit 
writes to that TLB entry by supervisor software. 


The lock bits are cleared by hardware reset. 





Subject to Change Without Notice 





Root Pointer and PTP Level2 Cache 


9.6 Root Pointer and PTP Level2 Cache 


9-10 


Toreduce the time required for each table-walkto access the PTEs, the Super- 
SPARC MMU caches two special pointers, the root pointer and the PTP from 
level 2 (PTP2). 


The root-pointer for the process in execution is cached. It is invalidated on ev- 
ery context switch, and the first table-walk for the new process will be used to 
cache in the new root pointer. Caching the root pointer saves the MMU from 
performing a level of table walk for each TLB miss for that particular context. 
The root-pointer entry is qualified by a valid bit and implicitly corresponds to 
the context in the context register. The valid bit for this entry is deared on con- 
text register write, context table pointer write, demap-entire, and demap for a 
context that matches the context register value. 


Caching a level-2 РТР can save memory references during MMU table walks. 
The SuperSPARC MMU caches one level-2 PTP. The cached level-2 PTP is 
qualified by a valid bit and implicitly corresponds to the context in the context 
register. This second-level cached PTP entry is used only for table-walks, 
R&M bit updates, and probe-entire operations. Other operations stili go 
through the three-level table-walk mechanism. The valid bit for this entry is 
cleared on context register write, context table pointer write, a table-walk oper- 
ation not using the cached PTP (in this case, a new entry will be written), de- 
map-entire, demap for a context that matches the context register value, and 
level-1 and level-2 demaps for which this PTP has a match. The PTP entry is 
cached on a new table-walk operation and is not cached if the level-2 entry is 
а PTE. 
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9.7 TLB Hit Criterla 


A TLB entry hits for an entry if its validity, virtual address, and access criteria 
are all met. The criteria for a TLB hit in the MMU are shown in Table 9-3. 


Table 9-3. TLB Hit Criteria 
Mapping 

("Eee э weno |____ Hm | 

Абб-ё or ACC = 7 or Contant = MOTXCTX 

АСС=6 or АСС = 7 or Context = MCTX.CTX 


16M-byte МА[31:24ј= VPN | АСС=6 or АСС = 7 or Context = MCTX.CTX 
[aone |Ves| ________|_ AGG-6or ACO = 7 or Content = MCTKCTX 


In all cases, the entry must be valid (V=1). 











VPN = VA[31:xx] indicates that the corresponding bit fields of the virtual ad- 
dress issued by the processor and the virtual address contained in a TLB Virtu- 
al Page Number (VPN) match; see Figure 9-16. 


Context = MCTX.CTX denotes that the contents of the context register (see 
Subsection 9.12.2) and the context field in the MMU entry match. Note that 
context is not compared for any page classified as a supervisor page by the 
АСС field being either 6 ог 7. In this manner, supervisor pages are present in 
all contexts simultaneously. 
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9.8 MMU Probe and Demap 


The ASI value 0x03 is used to invalidate or probe entries in the MMU. An invali- 
dation of an MMU entry is called a demap. 


A probe is done with a load alternate instruction, while a demap is done with 
a store alternate instruction. The data of the load or store altemate is ignored. 
The address format for both cases is: 


Figure 9—7. Probe and Demap Address Format 


9.8.1 MMU Probe 


МОРА рте Ва ___ 


31 11 10 ГА 0 


The various bit fields have the following meanings: 
VDPA Virtual Demap or Probe Address. See Table 9-5 for the signifi- 


cant virtual address bits. 
Type This field specifies the extent of a demap (mapping size) or the 
level of the entry probed. See Table 9-4 for the possible levels. 
Rsvd Reserved. These bits are ignored. They should be zeros for 
compatibility with future versions. 


The different types of MMU probes that can be performed and the correspond- 
ing retumed data are shown in Table 9-4. 


Table 9—4. MMU Probe Types 
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| бя | 


оте — | — — — — 


For all the probe operations, the TLB is accessed first. If a translation is found 
that complies with the matching criteria, the cached PTE is returned. The PTE 
is returned in the format in Figure 9-2, with PTE.R=1 and PTE.ET=2. The cri- 
teria for matching a TLB entry are shown in Table 9-5. 
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Table 9—5. MMU Probe Operations TLB Matching Critería 





[we Sm — [wa 
Page-4K-byte | V=1 | Context= | VPNs VDPA[31:12] 
MCTX.CTX 





о 
Е 
d 
: 
8 
j 





VPN-VDPA[31:18] levels 
256K-byte MCTX. 
Region – V1 | Context = VPN=VDPAI31:24] level= 1 
16M-byte CTX.CTX 


Q 










Я 
T 
5 
H 
" 
T 
Ji 
58 
Е 








(VPN = VDPA[31:12] & level = 3) | 
(VPN = VDPA[31:18] & level = 2) | 
(VPN = VDPA[31:24] & level = 1) | level = 0 


Reserved 








|| 
9" 








In all cases, the entry must be valid (Va). 


VPN = VDPA[31:xx] indicates that the corresponding bit fields of the Virtual De- 
map or Probe Address issued by the processor and the virtual address con- 
tained in a TLB VPN match; see Figure 9-16. 


Context = MCTX.CTX denotes that the contents of the context register (see 
Subsection 9.12.2) and the context field in the MMU entry match. 


Level denotes the level at which the PTE is found. 
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MMU Probe and Demap 


If the PTE is not found in the MMU, a table walk is initiated. In this case, the 
data returned may not be a PTE. The returned data can be a PTP or, if an error 
occurs, the value 0. The error cases are slightly different from the ones that 
may occur during a regular table walk when the MMU is processing a miss. 
They are detailed below. If an error occurs during a probe table walk, no excep- 
tion is taken, but the AT field of the MMU fauit status register(MFSR.AT) is set 
to 1 (load from supervisor data space), the MFSR.FT field to 1 (invalid address 
error) or 4 (translation error), and the MFSR.L field to the table level where the 
error was detected (see Subsection 9.12.3.7). 


С] Page Probe (4K-byte). For a page probe, the hardware does a table walk 
and retums the PTE found in the level 3 table even if this level 3 entry is 
invalid (PTE.ET=0). A table walk may not complete correctly for any of the 
following reasons: 


Ш The level З entry accessed is not a PTE (PTE.ET=1 or 3); 

Ш An intermediate-level entry is not a РТР (PTE.ET = 1); 

Ш Ahardware error occurs. 

Inthese cases, the value 0 is retumed, and the MFSR.FT is setto 4, which 
indicates that a translation error has occurred. 


С) Segment (256K-byte) or Region (16M-byte) Probe. For a segment or re- 
gion probe, a PTE (PTE.ET=2) or a РТР (РТЕ.ЕТ=1) is returned from the 
level 2 or level 1 table entry, respectively, even if it is invalid (PTE.ET=0). 
The table walk may not complete for any of the following reasons: 


Ш The entry accessed is reserved (РТЕ.ЕТез); 

Ш Ал intermediate level is not а РТР (PTE.ET = 1); 

Ш Ahardware error occurs. 

In these cases, the value 0 is returned, and the MFSR.FT is set to 4, which 
indicates that a translation error has occurred. 


С) Context Probe (4G-byte). For a context probe, the level 0 entry accessed 
is retumed if it is a PTE (PTE.ET=2) ога РТР (PTE.ET=1) or is invalid 
(PTE.ET=0). The value 0 is retumed and the MFSR.FT is set to 4 (transla- 
tion error) if the entry is reserved (PTE.ET=3), unless a hardware error 
occurs. 
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С} Entire Probe. For an entire probe, the hardware does a regular table walk 
and returns the valid PTE found in the appropriate level table. A PTE may 
not be found for any of the following reasons: 


Ш Aninvalid (РТЕ.ЕТ=0) or reserved entry (PTE.ET=3) is accessed in 
an intermediate level; 


Ш The level 3 entry accessed is a РТР (PTE.ET=1); 
Ш A hardware error occurs. 


In these cases, the value 015 returned. If an invalid entry is accessed, the 
MFSR.FT field is setto 1 (invalid address error); in the other cases, the FT 
field is set to 4 (translation error). 


When an entire probe completes successfully, the PTE accessed is loaded in 
the TLB, and, if necessary, the R bit is updated. For all the other probe opera- 
tions, the TLB is left unchanged and the R bit is not updated. 


аа 


Note: 


SuperSPARC generates a data access exception on probe types 
0х5-0х7. 





9.8.2 Demap 
The different types of demaps and the objects invalidatedin the TLB are shown 






in Table 9-6. 
Table 9-6. Demap Types 
[Type [Flush Object) 









три | 


The TLB hit criteria for a demap are shown in Table 9-7. 
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Table 9-7. TLB Demap Hit Степа 


| Access | № | 
Context = MCTX.CTX | VPN-VDPA[31:12] 


or АСС = 60r АСС = 7 


Context = MCTX.CTX VPNzVDPA([31:18) 
or АСС = 60r АСС = 7 


Context = MCTX.CTX VPNsVDPA[31:24] | levels 3 or 2 or 1 
ог АСС = 60r АСС = 7 


Context = MCTX.CTX 
and ACC «6 





МРМ = VDPA[31:xx] indicates that the corresponding bit fields of the Virtual De- 
map or Probe Address issued by the processor and the virtual address con- 
tained in a TLB Virtual Page Number (VPN) match; see Figure 9-16. 


Context = MCTX.CTX indicates that the contents of the context register (see 
9.12.2) and the context field in the MMU entry match. Note that context is not 
compared for any page classified as a supervisor page due to the ACC field 
being either 6 or 7. In this manner, supervisor pages are present in all contexts 
simultaneously. 


Level denotes the level at which the PTE is found. 


In a multiprocessor system, distinct processors can hold copies of the same 
PTE in their MMUs. Therefore, when a portion of a virtual space is demapped, 
the demap operation should be applied to all MMUS in the system. 


Depending on the system configuration, demap operations may be broadcast 
automatically to all processors or may require explicit software intervention on 
all processors to demap the required pages. SuperSPARC allows for automat- 
ic demapping only when used with the MultiCache Controller (MXCC). MXCC 
implements system demaps on XBus. When SuperSPARC is used with an 
MXCC on MBus or when it is used directly on MBus without an MXCC, all pro- 
cessors must be interrupted and requested to flush their own TLBs whenever 
the page table is modified. See also Section 8.5. 





Note: 


Software must guarantee that only a single demap operation is in progress 
at any one time across the entire system. Inconsistent operation will result 
iftwo demaps are received by a processor at any one time (including intemal 
demap requests). 


о ен 
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When an MMU demap has been completed by all processors (either through 
hardware or software), the following should be true: 


С All memory references to the virtual space(s) mapped by the flushed 
PTE(s) that were issued before the demap must have been completed. 


С] No MMU has a valid copy of the PTE(s). 


[1 All memory references to the PTE(s) itself (themselves) that were issued 
before the demap have been completed. 


Once the system has reached this state, page tables may safely be moditied 
and applications allowed to continue. 
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9.9 MMU Transparent Mode 


The ASI values 0x20-0x2f are used to bypass the MMU for data accesses. The 
ММО does not translate the physical address through the TLB. The virtual ad- 
dress is translated as follows (illustrated in Figure 9-8): 


ASI[3:0]  Paddi[35:32], VA[31:0] + Paddi[31:0]. 


Cacheability of these accesses is determined by MCNTL.AC bit (see Subsec- 
tion 9.12.1). Since these transactions are completely based on physical 
memory addresses, they can have no effect on virtual memory components, 
such as the MMU R&M bits. 


Figure 9-8. MMU Transparent Mode Translation 


9-18 





MMU Operation 


Subject to Change Without Notice 


Address Translation Modes 





9.10 Address Translation Modes 


Table 9-8 summarizes the different translation modes used by the Super- 
SPARC processor. 


Table 9—8. Address Translation Modes 


Translation Instruction Fetch Data Access Mode 
e (А51=0х08 or 0x09) (А51=0х0а or 0х06) 
Boot Mode PA[35:28]=0xff EN=0 — Disabled Mode 
MCNTL ВТ=1 | PA[27:0}=VA[27:0] EN=1 — Enabled Mode 
MCNTL.EN=X 
MMU Disabled PA[35:32]=0x0 PA[35:32]=0x0 
MCNTLBT=0 PA[31:0}=VA[3 1:0] PA{31:0]=VA[31:0] 
MCNTL. EN=0 
MMU Enabled | PA[35:12] гот PTE | PA[35:12] from PTE 
MCNTL.BT=0, PA[11:0]=VA[11 :0} PA[11:0]- VA[11:0] 
MONTL.EN«1 
MMU Not Applicable LDA and STA with 
Transparent 


А51=0х20-0х2ї 
PA[35:32]=ASI[3:0], 
PA[91:0]5 VA[31:0] 
BT is the Boot Mode bit of the MMU control register. 
EN is the MMU Enable bit of the MMU control register. 
PA[bit range] are the bits of the physical address. 
VA[bit range] are the bits of the virtual address. 
The А515 used above are: 


Г) 0x08 - User instruction Space. 










i 











Г] 0x09 - Supervisor Instruction Space. 

Г] Ox0a - User Data Space. 

(Д OxOb - Supervisor Data Space. 

[1 0x20-0x2f - MMU Transparent Mode (see Section 9.9). 
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9.11 No-Fault Operation 


The MGNTL.NF bit, when enabled, tums on no-fault operation. In this mode, 
most exceptions generally reported to the pipeline are disabled. This mode is 
intended for use by system software during the processing of exceptions and 
during system diagnostic functions. The NF bit should not be set during user 
code execution. 


Any memory transaction that has an error or an exception while in No-Fault 
mode does not trap, except for the cases below. If an exception occurs in the 
case of load transactions, the destination register will be updated with indeter- 
minate information. Should an exception occur in the case of stores, no regis- 
ters will be updated, and the system designer becomes responsible for ensur- 
ing that the system does not modify memory. 


When operating with NF set, the success or failure of every memory transac- 
tion should be verified by explicitly reading the MFSR fault status register. 


In general, all normal memory exceptions are disabled by NF. There are sever- 
al types of exceptions that are not disabled by NF. The exception types that 
are not disabled are all considered fatal errors (not recoverable) and will in- 
duce error mode. The following exceptions are not disabled by NF: 


О Internal Error 


Errors such as multiple tag matches from the caches are considered fatal 
intemal errors. 


(С Control Space Error 


Errors Reading or Writing SuperSPARC ASI registers are considered 
fatal. 


С] Supervisor Instruction (ASI 0x09) 


Supervisor instruction fetch errors cannot be disabled by NF, since the 
processor has effectively received an error in the instructions it needs to 
execute. Since there is no other source for instructions, an exception must 
be generated. 
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(J User Instruction (ASI 0x08) 


Instruction fetches (explicitly not altemate space read and writes) from ASI 
0x08 (User Instruction space) will cause exceptions to be reported. As 
above, no instructions could be executed without the exception. 


(а Unassigned ASIs 
Unassigned ASIs will trap regardless of MCTL.NF. 


Дт 


Note: 


it is the responsibility of system software to ensure that, upon return to user 
code, the NF bit is never set. Any load operation that receives an exception 
masked by the NF bit will load indeterminate data to the destination register. 
Any store that receives an exception masked by NF will have no effect on 
registers (guaranteed) or memory (system dependent). 
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9.12 MMU Registers (ASI=0x04) 


Accesses to ASI 0x04 read and write SPARC Reference MMU control regis- 
ters. These registers are all 32 bits wide, and their addresses are shown in 
Table 9-9. Attempts to access them with Byte, Half-word, or Double-word op- 
erations will result in data access exception. Virtual address bits [12:8] are 
used to select individual registers; all other bits are ignored and should be 0. 
Any access to addresses other than those defined in Table 9-9 causes a 
data access exception. 


Table 9-9. MMU Control Registers 


Габи | O pesonn 
020 
ооо” | Shadow Faun Satis Register MSFSR| 


9.12.1 MMU Control Register (MCNTL) 


The contro! register contains the general MMU control and status flags. It also 
contains the control flags for the instruction and data caches. The MMU control 
register is shown in Figure 9-9. 










— 


Figure 9-9. MMU Control! Register 


31 27 23 16 


[ac [SE [вт [РЕ [ть | 56 | IE [0E [250] _ ма | NF [ЕН] 
i15 14 i3 12 1| 10 9 8 7 6 1 0 
ітрі Implementation number of the SuperSPARC chip; this field is 
hardwired and is read-only. SuperSPARC fixes MCNTL Impi = 
0х0. 
мег Version number of the SuperSPARC chip; this field is hard-wired 
and is read-only. 
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reserved 


AC 


BT 


„ММО Fiegisters 


Reserved. These bits are ignored on write and read as zero. 


Table Walk Cacheable Bit. When this bit is set, references from 
MMU table walks are cached in the SuperSPARC external 
cache. When this bit is cleared, table walk accesses are not 
cached in the SuperSPARC extemal cache. Table walk refer- 
ences are never cached intemally. When SuperSPARC is oper- 
ating directly on MBus (with no MXCC), TC must be deasserted. 


Altemate Cacheable bit. This bit indicates whether an access is 
cacheable in the absence of the C bit in the PTE due to the MMU 
being disabled or in a mode where the PTEs are not needed for 
translation. The only exception to this case is the instruction 
fetches in boot mode, which are always non-cacheable. When 
this bitis clear, memory accesses for which the physical address 
is not obtained through а PTE are not cached in the Super- 
SPARC intemal caches or the external cache. If this bit is set, 
these memory accesses are cached. 


The cacheability of MMU table walks are controlled by TC and 
not by this bit. 


Snoop Enable. This bit, when set, enables cache snooping on 
the SuperSPARC bus. This bit must be set to enable the cache- 
consistency mechanisms. Assertion of this bit does not affect 
store buffer snooping, which is always enabled if the store buffer 
is enabled. 


In an SSP operating directly on the MBus (without MXCC), both 
the instruction and data caches will snoop regardless of whether 
they are enabled (IE and DE bits) and without regard to the state 
of SE. This is necessary for maintaining consistency should any 
of the caches be disabled after they contain valid data. There- 
fore, initialization code should contain a flash clear for these 
caches before enabling snoops at power-on reset. This is to pro- 
hibit garbage data from residing in the snooping caches. 


Boot Mode. A clear BT bit indicates normal SPARC reference 
MMU operation. A set BT bit indicates boot mode. Physical ad- 
dresses for instruction fetches in boot mode are formed by add- 
ing the address Oxff0000000 to the lower 28 bits of the virtual ad- 
dress. Instruction accesses in boot mode bypass the Super- 
SPARC internal cache. Data accesses are unaffected by the 
boot mode. 
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PE Parity Enable. When set, even parity is generated and checked 
by SuperSPARC. When zero, odd parity is generated but not 
checked. 


mb MBus Mode. This bit, if set, indicates that the SSP is connected 
directly to the MBus without an MXCC. This bit is read-only and 
reflects the state of the CCMODE pin. 


SB Store Buffer Enable. This bit enables store buffer operation. 
When clear, stores do not go to the store buffer and occur syn- 
chronously (see Chapter 8). When the SB bit is set, stores are 
buffered in the store buffer (see Section 10.6) and complete as 


soon as entered. 

IE Instruction Cache Enable. This bit enables the instruction cache 
(see Section 10.2). 

DE Data Cache Enable. This bit enables the data cache (see Sec- 
tion 10.4). 

PSO Partial Store Ordering. When set, the memory model is in Partial 


Store Ordering (PSO) mode. When clear, it is in Total Store Or- 
dering (TSO) mode (see Chapter 8). The SB bit must be set (en- 
abling the store buffer) to get either PSO or TSO mode. 


reserved Reserved. These bits are ignored. 


NF No-Fault Bit. When this bit is asserted, faults that occur for ASIs 
0x08, Охба, OxOb, and 0x20-0x2f are ignored (not reported to the 
processor). The remaining ASis, including ASI 0x09, Super- 
SPARC intemals, and control space (ASI 0x02), are not affected 
by this bit. Note that the MFSR is always updated regardiess of 
whether the fault is taken (see Section 9.11). 


EN MMU Enable. This bit enables or disables the operations of the 
MMU. ff the BT bit is set, the EN bit is effective for data cycles 
only; instruction fetches operate with boot mode translations. 
When the MMU is disabled, the virtual address is used as the 
physical address without translation. 


On power-on reset, all the control bits in MCNTL are cleared, except for the 
BT bit, which is set. 


9.12.2 Context Table Pointer Register (MCTP) and Context Register (MCTX) 


The context table pointer register (MCTP) contains a pointer to the context 
table in physical memory. The context table pointer register is shown in 
Figure 9-10. 
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Figure 9~10. Context Table Pointer Register (MCTP) 


CTP reserved 
31 7 0 


3 


Context Table Pointer. The context table pointer contributes up 
to 24 bits of the root pointer. 


reserved Reserved. These bits are ignored on writes and read as zero. 


The Context Register (MCTX) contains the displacement in the context table 
to access the root pointer. It defines the current virtual address space. The con- 
text register has the structure shown in Figure 9-11. 


Figure 9-11. Context Register (MCTX) 


| served | 

31 15 0 

reserved Reserved. These bits are ignored on writes and read as zero. 

CTX Context Number. The context number contributes up to 16 bits 
in the root pointer. 


The context table contains the root page table pointers for all the contexts. The 
MCTX provides the offset into the context table to retrieve the root page table 
pointer for that context. The physical address used to retrieve the root page 
table pointer is formed as shown in Figure 9-12. 


Figure 9-12. Root Pointer Physical Address Generation 


m 


31 13 7 0 
MCTX | reserved CTX | 
31 15 9 0 


35 


um сова [s 
of Дозе Pointer CTP[31:14] | CTX{9:0] 
17 7 


CTP[13:8] OR CTX[15:10] 
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This address formation gives the kernel the freedom to trade off a number of 
context bits against alignment restrictions on the context table. Note that, if a 
context of 16 bits is desired, the context table must be aligned to a 256K-byte 
boundary. 


Table 9-10 gives the alignment requirements for the context table for different 
widths for the context register. 


Table 9-10. Alignment Requirements 





9.12.3 MMU Fault Status Register (MFSR) 


The fault status register provides information for SuperSPARC faults asso- 
ciated with the memory system and other intemal error sources. The MFSR, 
along with the reported trap type, are used to distinguish between the various 
types of errors and faults that can occur. 


There are several general types of exceptions: instruction access faults, data 
access faults, store buffer exceptions, and intemal errors. 


9.12.3.1 Instruction Access Errors 
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Instruction access faults are signalled through the instruction access excep- 
tion trap. There are several possible sources for the exception. It may be 
created by normal page faults reported by the MMU. MMU-generated errors 
are distinguished by the encoding of the MFSR.FT (fault type) field (see Sub- 
section 9.12.3.7), indicating the MMU fault type. The fault may be extemally 
generated for such things as bus timeouts and parity errors. Externally gener- 
ated faults are indicated by various bits in the MFSR that аге tied to equivalent 
error responses on the external busses. A final source of errors is from inter- 
nally detected inconsistencies. For example, in the case of a multiple tag 
match in the instruction cache, SuperSPARC enters error mode and generates 
а watchdog reset, and the MFSR.EM (error mode) bit (see Subsection 
9.12.3.7) will be set. 
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9.12.3.2 Data Access Errors 


Data access faults are signalled through the data access exception trap. As 
with instruction access errors, page faults, external bus errors, and internal er- 
rors may all cause exceptions that are indicated in the various fields of the 
MFSR register. Additional errors can be caused by erroneous accesses to in- 
ternal ASI control spaces. These аге indicated by the MFSR.CS (control space 
access error) status bit (see Subsection 9.12.3.7). 


9.12.3.3 Store Buffer Errors 


Store buffer errors are signalled as data store error traps. These errors are 
asynchronous exceptions; they are reported after the apparent completion of 
the instruction that caused them. When this type of error occurs, the MFSR.SB 
(store buffer error) bit will be asserted. The source of the error is generally a 
parity, timeout, or similar error on the bus. The erroring store has already been 
successfully translated by the MMU; otherwise, a data access exception 
would have been reported. The operation can be recovered by reading the 
contents of the store buffer and explicitly completing the transaction using 
transparent MMU physical address memory references or other means. De- 
pending on the error, software may be able to restore the system state and 
continue operation. See Section 10.6 for a detailed description of store buffer 
errors and operation. 


9.12.3.4 Control Space Access Errors 


ilegal ASI operations will cause data access exceptions and set the 
MFSR.CS (control space access error) bit. 


Any of the following four conditions can cause these errors: 

С References to an invalid ASI. 

С] References to a legal ASI, but with an invalid data size. 

Q An invalid virtual address field within a valid ASI. 

O Bus error оп ASI 0x02 (control space). 

The MFSR.CS bit is not asserted under either of the following conditions: 


(Д Buserror on ASI 0x08, 0x09, OxOa, OxOb, 0x20-0x2f (the standard and by- 
pass А55). 


О Errors on MMU probes. 


The contents of the Fault Address Register (MFAR) is set to the faulting ad- 
dress when MFSR.CS (contro! space access error) is set. 


9.12.35 Error Mode and Internal Errors 


The SPARC Architecture specifies that a processor that takes a precise or new 
deferred exception with PSR.ET = 0 will enter error mode. 
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SuperSPARC will also enter error mode if an internal error occurs. An example 
of this is detection of multiple tag matches within the instruction and data 
caches. inthese cases, the MFSR.FT field will be set to6, indicating an internal 
error. 


Whenever SuperSPARC enters error mode, MFSR.EM is set. SuperSPARC 
exits error mode by taking a watchdog reset. MSFR.EM should be examined 
by the reset handler to distinguish software-induced error conditions from 
hardware reset. 


ee íi! ue 
Note: 


When internal error occurs and watchdog reset is taken, only MFSR.EM and 
MFSR.FT are meaningful. The states of the other MFSR bits are not guaran- 
teed. 





9.12.3.6 MFSR Timing and Operation 
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The MFSR is guaranteed to be valid only after instruction, data, and store 
buffer exceptions (instruction access, exception, data access exception, 
and data store error, respectively). The contents of the fault status register 
can be misleading if examined at an arbitrary time. For example, an instruction 
fetch that is not a demand fetch (not needed immediately for execution) may 
cause the fault status register to indicate a fault that is never signalled as an 
actual exception. 


Read access to the MFSR is at virtual address 0x00000300 using ASI 0x04. 
When read from this address, the MFSR is cleared. MFSR can be read and 
written at virtual address 0x00001300 using ASI 0x04. 


The MFSR.SB (store buffer) error bit is sticky. іп other words, once it is set, it 
will not be overwritten by the occurrence of any other exceptions. This is to en- 
sure that store buffer error events are never lost. The SB bit will be cleared on 
any read of the MFSR; it can also be cleared by writing explicitly to the read/ 
write version of the MFSR. 
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Note: 


Translation errors are considered high-priority errors. The occurrence of a 
translation error cannot be overwritten by any other errors. Even ifthe excep- 
tion associated with a transiation error is not taken (due to other exceptions, 
prefetches, branches, etc.), the MFSR.FT bit will continue to indicate the 
occurrence of a translation error. 


This implies that, under certain circumstances, a trap may be incorrectly 
stated to be a translation error when in fact it may have been due to other 
causes. System software should be able to recover from this situation by us- 
ing probe operations to test the validity of translations, and, if correct, retrying 
the instruction that reported the exception. If the true source of the exception 
continues to exist, it will be reported again. 


Translation errors may be overwritten by other translation errors, in which 
case the MFSR.OW (overwrite) bit will be set. 





General rules for overwriting the MFSR (and MFAR) are presented in 
Table 9-11. The MFSR.OW bit will always be set if an overwrite condition 
occurs. 


Table 9-11. MFSR Overwrite Operations 


[Penang Ever [New Ear Ow баке | — Асот байо — 
Twente | Warton rer — Se [Тала — 
Data Access Exception 
Data Access Exception | Оа Access йот | Sa | Data acess Бисери | 
пети Access Бес Gier 
ее Access Exception | баш оона орол | Clear | Data осон са | 
попов Access авот | Set — | патот Access Exception 



















In all cases in which simultaneous errors occur, SuperSPARC will choose the 
highest-priority error and update the status accordingly. The positional priority 
described above must also be taken into account. The priority order is listed 
in Table 9-12 (priority 1 is the highest priority). 
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Table 9-12. Priority Order for Errors 





[Eme [Remy] 
ССИ | 
[Bata Access Exceptions | 2_ 

шй 







Instruction Access Exceptions 


9.12.3.7 MFSR Register Description 


варвар 


reserved Reserved. These bits are ignored on write and read as zeroes. 


EM Error Mode Reset. When set, this bit indicates that an Error 
Mode Reset has been taken. 

CS Contro! Space Access Error. This bit is set to signal errors during 
ASI accesses. See Subsection 9.12.3.4. 

SB Store Buffer Error. When set, this bit indicates that a store buffer 
error (data store error) has occurred. 

P Parity Error. When set, this bit indicates that the internal parity 
checkers detected a VBus parity error. Parity errors also set the 
MFSR.UC bit. 

UD Undefined Error. This bit is set for external bus errors when the 


external system signals an unfinished error. UD/UC/TO/BE er- 
rors are mutually exclusive. 


ис Uncorrectable Error. This bit is set for external bus errors for 
which an uncorrectable error occurred. Parity errors and ECC 
errors are also reported as uncorrectable errors. UD/UC/TO/BE 
errors are mutually exclusive. 


TO Time-Out. When this bit is set, it indicates that a time-out error 
occurred for an external bus transaction. UD/UC/TO/BE errors 
are mutually exclusive. 

BE Bus Error. When this bit is set, it indicates that a bus error oc- 


curred on a faulting access. This includes invalid bus transac- 
tions. UD/UC/TO/BE errors are mutually exclusive. 
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Note: 


The bits EM/CS/SB/P/UD/UC/TO/BE іп МЕЗА are defined in the SPARC Ar- 
chitecture as the ЕВЕ field (MFSR[17:10]). This ЕВЕ field always records the 
latest bus error cause and in some cases may deviate from the SPARC archi- 
tecture rule that states: 


*A lower priority fault may not overwrite the 
MFSR status of a higher priority fault." 


If alower-priority fault occurs after a higher priority fault but before the MFSR 
has been read, the MFSR should remain unchanged. SuperSPARC 
complies with this rule for every bit, except for the EBE field. 


For example: A data table walk suffers timeout, and MFSR correctly records 
the status of this translation fault, setting MFSR.[AT,FT, TO] correctly. Then 
ademand fetch suffers bus error; this is an instruction access fault, which is 
a lower priority than a translation fault. MFSR.[AT,FT] are correctly 
unchanged (retaining the status of the translation fault instead of the new 
instruction access fault), but MFSR.TO is not retained. Instead, the EBE field 
is updated and MFSR.BE is set. 


RR NN UE UE | 


L Level. This field is set to the page table level of the entry that 
caused the fault. Н an extemal bus error is encountered while 
fetching a page table entry (either a PTE or PTP), the level field 
records the page table level for the entry. The field is defined in 
Table 9-13. 


Table 9-13. Field Levels 






Co | esteem | 
т [terre 
2 |е oy 
Cs [covers ery 


AT Access Type. This field defines the type of access that caused 
the fault. See Table 9-14. 
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Table 9-14. Access Types 
| о | Load from User Data Space 
| 1 | Load from Supervisor Data Space 
ши Load/Execute from User instruction Space 
| з | Load/Execute trom Supervisor Instruction Space 
| | Store to User Data Space 
ши Store to Supervisor Data Space 
EZ Store to User Instruction Space 
Store to Supervisor Instruction Space 
FT Fault Type. This field defines the type of the current fault. See 
Table 9-15. 
Table 9-15. Fault Types 
[Fr Ferme | 
FT = 1. The invalid address error code is set when an invalid PTE 
or PTP is found while fetching an entry from the page table for 
a regular table-walk or a probe operation. 
FT « 2. The protection error code is set if an access is attempted 
that is inconsistent with the protection attributes of the corre- 
sponding page table entry. 
FT = 3. The privilege error code is set when a user program at- 
tempts to access a supervisor-only page. 
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FT = 4. A translation error code is set when an extemal bus error, 
reserved PTE, or level-3 PTP is found while fetching an entry 
from a page table for a regular table-walk or a probe operation. 
The L field records the page table level at which the error oc- 
curred for the above two error codes. The UD, TO, BE, and UC 
fields record the type of bus error, if any. 


FT = 5. An access bus error code is set when an extemal bus 
error occurs during memory access. 


ЕТ = 6. The intemal error code is set when either cache detects 
an intemal inconsistency, such as multiple matches for a particu- 
lar request. Internal error causes the chip to enter error mode, 
causing a watchdog reset. 


Invalid address errors, protection errors, and privilege violations 
are a function of the access type and the ACC field of the corre- 
sponding PTE (as shown in Table 9-16). The errors are set as 
in Table 9-17. 


Table 9—16. Access Permission Codes 
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Table 9—17. Access Fault Type (FT Field) 


FT Code 
PTE[V]=1, PTE[ACC]- 





FAV Fault Address Valid. This bit is asserted if the contents of the 
MFAR are valid. The MFAR is not valid for instniction access 
faults (see Subsection 9.12.4). 

OW Overwrite. This bit is asserted if the fault status register (MFSR) 


has been written more than once by faults of the same class 
since the last time it was read (see Subsection 9.12.3.6). 


9.12.4 MMU Fault Address Register 
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The MFAR records the virtual address of the same faults reported in the 
MFSR. The MFAR is overwritten according to the policy defined for the MFSR 
(see Subsection 9.12.3.6). The MFAR is read-only according to the reference 
MMU specification. For diagnostic purposes, SuperSPARC has additional 
read/write access to the MFAR using virtual address 0x00001 400 in ASI 0x04. 
The normal read-only MFAR is at virtual address 0x00000400 in ASI 0x04. 


[ИЗУЗЕВ === 


Note: 


SuperSPARC will never place instruction fault addresses in the MFAR. The 
information is not needed, since it is saved as the faulting PC/nPC when the 
trap occurs. 


д 


Only word-sized access to the MFAR is supported; other referenced sizes pro- 
voke a data access exception. The structure of the fault address register is 
as in Figure 9-13. 
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Figure 9—13. Fault Address Register 


Note: 


A rare scenario can occur when SuperSPARG is connected directly to the 
MBus with the store buffer off (not normal operating conditions). If a memory 
reference takes place that causes a copy-back, and the copy-back suffers 
afault (memory fault), SuperSPARC responds by sending adata access ex- 
ceptionto the pipeline. When this occurs, the MFAR holds the virtual address 
that caused the copy-out. Since this is not the address that caused the fault, 
SuperSPARC does not set MFSR.FAV. 


If such an error occurs (MBus, data access, exception, MFSR.FAV deas- 
serted, store buffer disabled), system software can recover by forcing any 
modified data in the four possible cache lines (based on the virtual address) 
out to memory using transparent MMU references. Once this data has been 
flushed, the appropriate valid bits should be cleared, and the original opera- 
tion may be retried. This situation is not expected to occur during normal op- 


eration. 
| 


9.12.5 MMU Shadow FSR Register (MSFSR) 


This is identical to the MFSR but is used to record memory system errors while 
in emulation mode. This is done to avoid destroying the regular MFSR be- 
cause of emulation instructions. This register is not used for normal operation. 
See Chapter 15 for more details. 
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9.13 MMU Translation Look-Aside Buffer (TLB) 


64 TLB entries, along with the root pointer and level-2 PTP, are accessible di- 
rectly with the ASI value 0x06. This direct-access capability is provided for 
diagnostic purposes and also to lock TLB entries. 


MMU entries are accessed as 32-bit words. Other data sizes provoke a 
data access exception. 


The address format is shown in Figure 9-14. 


Figure 9-14. MMU TLB Diagnostic Access Address 


reserved — BEY SEL reserved] 


31 17 11 10 T 0 


The various bit fields have the following meanings: 
reserved X Reserved. All reserved bits are ignored (but should be zero). 
TLBE TLB Entry. Selects which of the 64 TLB entries is referenced. 


SEL Select. Determines what part of the referenced entry is 
accessed. The SEL field allows selection of either TLB fields or 
values cached in the root pointer and PTP2 caches. The entries 
in the sel field are described in Figure 9-15. 


Figure 9-15. TLB Sel Field 













[a № 
[5 | вену Page Naber 
Ст [RB Eny Conte rumoer 
[a [E Eny Physica Page Number | 
з [вену 
Ce [Forn 
Ls [Перт 
Пею PrP Vua Ares | 


For Sel values 4, 5, and 6, the TLBE field is not used. Data will 
be accepted or returned in the formats shown in Figure 9-16. 
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Figure 9—16. TLB Entry Virtual Page Number (SEL - 0) 


VPN 


reserved 


ом | reserved 


31 11 0 


Virtual Page Number. When the entry maps a 4K-byte page, all 
bits are defined. When the entry maps a 256K-byte segment, 
only the 14 most significant bits [31:18] are significant. When an 
entry maps a 16M-byte region, only the eight most significant 
bits [31:24] are significant. When an entry maps a 4G-byte re- 
gion, the entire Vaddr field is not significant. 


Reserved. This field is ignored on writes and read as zero. 


Figure 9—17. TLB Entry Context Number (SEL = 1) 


reserved 


CTX 


31 15 


о 


Reserved. This field is ignored on writes and read as zero. 


Context. This field is the context number for which this TLB entry 
is valid. 


Figure 9—18. TLB Entry Physical Page Number (SEL » 2) 





Note: 


This format is slightly different from the in-memory PTE format. The mean- 
ings of the bit fields are identical to those in a PTE except for the V and the 
LVL fields. The referenced bit (R) in the PTE is used to hold the valid bit (V). 
The level field (LVL) is used to specify the mapping size. 





PPN 


Physical Page Number. High-order 24 bits of the 36-bit physical 
address. If the PTE maps a 256K-byte segment, 16M-byte re- 
gion, or 4G-byte context, the lower 6, 12, or 20 bits, respectively, 
of the PPN are ignored. 


Cacheable. If this bit is set to 1, the page is cacheable in the Su- 
perSPARC internal (and external) caches. If it is zero, the page 
is not cacheable. 


9- 
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M Modified. When a page is accessed for writing and the modified 
bit is not set, the MMU sets the modified bit (M) in both the TLB 
and the main memory page table entry. 

У Valid. This field is set if the TLB entry is valid. 
ACC Access Permissions. This bit field encodes the access permis- 


sions associated with this TLB entry. The АСС field is encoded 
as shown in Table 9-1 or Table 9-16. 


LVL Level. This field is used to specify the mapping size as encoded 
in Table 9-18. 


Table 9—18.Map Size Encoding 





LVL | Mapping Size 


LOCK Lock Bit. If setto 1, the entry will not be displaced by the replace- 
ment algorithm. An entry can only be locked by storing a 1 in the 
lock bit; see Section 9.4. The Lock bit is set to 0 at power-on re- 


If all TLB entries are locked, no replacement can take place, and a TLB miss 
will result in the processor entering an infinite table walk sequence. The pro- 
cessor will continue to read from the page tables forever and never return. 
No error will be reported, and the only way to exit is with reset. 


о с 
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Figure 9-20. Cached Root Pointer (SEL = 4) 


RPR 


31 1 


о 


Root Pointer. This is the cached physical address of the origin 
of the level-1 table. 


Valid Bit. A valid field of zero signifies an invalid root pointer, 
while a valid bit of 1 signifies a valid root pointer. Values of 10 and 
11 are not used. 


Figure 9-21. Cached Level 2 PTP (SEL = 5) 


L2PTP 


L2PTP 


31 10 


Level-2 Page Table Pointer. This field holds a cached level-2 
page table pointer that references the last level-3 page table ac- 
cessed. 


Valid Bit. A valid field of zero signifies a level-2 page table point- 
er, while a valid bit of 1 signifies a valid level-2 page table pointer. 
Values of 10 and 11 are not used. 


Figure 9-22. Cached Level-2 РТР Virtual Address (SEL = 6) 


L2VA 


reserved 


| 


31 17 


o 


Level-2 Page Table Pointer Virtual Address. This 15 the address 
tag for the cached level-2 page table pointer. On a TLB miss, if 
the cached root pointer is valid, the miss virtual address is 
compared with L2VA, and, if the corresponding bits are the 
same, the level-3 PTE can be accessed directly using L2PTP. 


Reserved. These bits are ignored on write and read as zero. 
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Note: 


#15 possible to have multiple matches in the MMU if two or more entries map- 
ping the same virtual space area are simultaneously present in the TLB. This 
cannot happen during the regular operating modes in which entries are 
loaded by the table-walking hardware. With the direct access capability, 
however, it is possible to erroneously load distinct entries mapping the same 
portion of a virtual space. When a virtual address from this area is translated 
by the MMU, the result is undefined. This condition is not reported to the soft- 
ware. Although operation is undefined, the hardware is internally protected 
and will not be damaged. 


This condition may also occur under non-diagnostic situations if MMU de- 
map transactions are not issued where required. As an example, if a normal 
level-3 PTE is present ín the TLB, the page table is modified to include a 
level-2 or higher PTE mapping the same space, and a reference to a different 
location within the levei-2 mapping. Under these conditions, the TLB will end 
up with two entries mapping the original page. A demap transaction is re- 
quired after changing the page table mapping before any user instructions 
are generated. Demap operations are required by the reference MMU speci- 


fication for these cases. 
| | 
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Chapter 10 


Caches/Store Buffer 





The SuperSPARC processor (SSP) has two large multi-way associative 
caches to provide high performance: 


(д 16K-byte data cache. 
(а 20K-byte instruction cache. 


SuperSPARC also implements an eight-doubleword store buffer. The store 
buffer functions to hide most of the latency on writes to memory. 


Topic Page 
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10.1 Introduction 


Cacheability 


10-2 


The SSP contains a 20K-byte instruction cache memory and a 16K-byte data 
cache memory. Cache memories function by keeping copies of recently used 
data from main memory. Since the caches are small and close to the pipeline, 
they support high performance by supplying data or instructions much more 
quickly than is possible from main memory. They also have the secondary ef- 
fectof reducing the number oftransactions on the system bus, which is a bene- 
fit in multiprocessor systems. 


The SSP's caches are organized with separate instruction and data caches, 
which is often called a Harvard architecture. The advantage of a Harvard archi- 
tecture is that the two caches can be accessed at the same time, increasing 
the amount of work that the processor can accomplish in a cycle. 


In configurations with the MultiCache Controller (MXCC), up to 2M-byte of ex- 
ternal cache memory can be used. The organization, control, and program- 
ming of the external cache are described in Chapter 16. 


The SSP also contains a store buffer that accepts stores, allowing the pipeline 
to continue processing instructions beyond the store. Stores in the buffer are 
passed to the bus when the bus is available and are removed from the buffer 
as they complete. The store buffer provides support for the total store ordering 
(TSO) and partial store ordering (PSO) memory models (see Chapter 8). 


Each load or store operation is either cacheable or non-cacheable, a differ- 
ence that governs how the access is processed. Cacheable operations check 
the caches before accessing main memory or VO devices, and may bring a 
copy of the data into the cache if it is not already there. Non-cacheable opera- 
tions do not check the cache for the data before accessing main memory or 
VO devices. Non-cacheable operations do not copy data into or out of the 
cache. 


Generally, only cacheable accesses participate in cache coherence. in MBus 
systems, cacheable and non-cacheable accesses to the same address ac- 
cess the same memory location, while in XBus systems cacheable and non- 
cacheable address spaces generally do not overlap. 


The cacheability of memory operations originating from an SSP is determined 
by the Memory Management Unit (MMU), according to control bits in the 
MCNTL register and the page table entry (PTE). There are three sources of 
memory references in the SSP: instruction fetches, load/store instructions, 
and the MMU. Table 10-1 summarizes the principal factors governing the 
cacheability of each. 
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Table 10—1. Сасћеађту Summary 


Cacheability 


Instruction Fetch 










МОМТЦЕМ 

Load/Store MCNTL.DE 

Instructions MCNTL.AC 
PTE.C 


wu | — — 


Table 10-5 





Cache Consistency and Snooping 


In multiprocessor systems, and even in uniprocessor systems with DMA ШО, 
itis important that accesses to a datum from different places in the system all 
get the same value for the datum. With caches, there may be several copies 
of adatum in different caches, in addition to the copy in main memory. Ensuring 
that all accesses see a consistent time series of values for a datum requires 
cache-consistency support in caches and on the system interconnect (bus). 


In bus-based systems, ali caches can easily observe all transactions on the 
shared system bus. In this environment, cache consistency protocols can be 
used that rely on each cache observing every bus transaction, even those for 
which it is neither the source nor the destination. This observing of the bus is 
call “bus snooping” and is critical to the cache-coherence protocols supported 
by the SuperSPARC processor. 


The particular cache-coherence protocol used by the SuperSPARC chipset 
varies depending on the bus in use. For details related to the particular buses 
supported, see Chapter 17, Chapter 18, or Chapter 19. 


The store buffer is a kind of cache and must snoop bus transactions. Data in 
the store buffer has been modified but not yet communicated to the rest of the 
system. If another processor's cache attempts to access a datum that is con- 
tained in the store buffer, the buffer must act as the owner of the data and sup- 
ply it. 
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Consistency and Snooping Between instruction Cache, Data Cache, and Store Buffer 
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The two intemal caches and the store buffer all snoop intemal as well as exter- 
nal transactions. Usually this snooping occurs on the bus and is visible on the 
bus pins of the SSP. For example, in the direct MBus configuration, if the 
instruction cache attempts to read a sub-block that is dirty in the data cache, 
the read will appear on MBus, and the data cache will assert MIH on MBus to 
indicate that it will supply the data. 


The store buffer snoops data cache transactions and will hold the data cache 
if it attempts access to data still in the store buffer until the store has been ac- 
knowledged from main memory. 
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Instruction Cache 


10.2 instruction Cache 


The SuperSPARC processor has an internal instruction cache to support high- 
performance execution. It features: 


Г) 20K-byte total capacity. 

[] Five-way set-associative organization with 64 sets. 
Q Tags containing physical addresses. 

[] 64-byte block size. 

[1 Blocks divided into 32-byte sub-blocks. 

О 128-bit (four-instruction) access for instruction fetch. 
О Prefetching. 


The instruction cache is organized as shown in Figure 10-1. The cache is five- 
way set associative with 64 sets. Physical address bits PA[11:6] select one of 
the 64 sets. Each set stores five blocks. Each set has a set tag (STag) that con- 
tains the usage and lock information for the five blocks in the set (see 
Figure 10-6). 
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Figure 10-1. Instruction Cache Organization 
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Each of the five blocks in a set has a physical tag (PTag) and two sub-blocks 
that hold the cached instructions. The PTag contains the upper 24 bits of the 
physical address of the data currently contained in the block, it any, and a valid 
bit for each of the sub-blocks. The even sub-block stores data with РА[5]=0, 
and the odd sub-block stores data with PA[5}=1. See Figure 10-5 for the PTag 
format. 


On an instruction fetch, if a valid block is found in the instructiorrcache for the 
fetch address, the cache will send up to 128 bits (four sequential instructions) 
to the instruction queue (see Table 10-2). The number of instructions sent will 
be limited by the block boundary or the sub-block boundary if the odd sub- 
block is invalid. 


The cache is enabled by the MCNTL.IE bit of the MMU control register (see 
Subsection 9.12.1). The cache is disabled at power-on (see Section 13.2, 
Hardware Reset) and remains disabled until the MCNTL.IE bit is set. 

At power-on the contents of the instruction cache are undefined. It is the re- 


sponsibility of the software to initialize the instruction cache by resetting the 
valid bits (see Subsection 10.3.1). 
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10.2.1 Instruction Cache Access 


These are the steps that compose an instruction cache access: 


1) 


2) 
3) 


4) 


5) 


10.2.2 Cacheability 


The virtual address is translated by the MMU to produce a physical 
address, PA[35:0]. An instruction access exception may be raised by the 
MMU. 


PA[11:6] selects one of the 64 sets. 


The address in the PTags of the five blocks in the set are compared to 
PA[35:12], ignoring any invalid sub-blocks. 


If there is a matching block address and the sub-block selected by PA[5] 
is valid in the PTag, the blockis selected. Otherwise, the access is a cache 
miss, and the sub-block is read from memory. 


PA[5:2] selects the first four-byte instruction from the matching sub-block. 
It is aligned along with other sequentially successive instructions within 
the valid sub-blocks of the selected cache block. The resulting aligned 
instructions, up to four instructions (128 bits), are sent to the instruction 
queue. 


Whether the instruction fetches are cached depends on the operating mode 
of the MMU, as set in the MMU control register (MCNTL), and cacheable bits 
within each page table entry (PTE). See Table 10-2. 
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Table 10—2. Instruction Cacheability 


modum MCNTLIE=0 | MCNTL.IEz1 MCNTLJE=1 
MCNTL.AC=X | MCNTL.AC=0 | MCNTL.AC=1 
Not Cached Not Cached Not Cached 

MONTL Tet 

MCNTL.EN=X 

MMU Disabled | Not Cached Not Cached Cached 

MCNTL .BT=0 

MCNTL.EN=0 


MMU Enabled | Not Cached С=1: Cached | С=1: Cached 
MCNTLBT = 0 C=0: Not C=0: Not 
MCNTLEN = 1 Cached Cached 

BT = Boot Mode bit of the MMU control register. 

EN = MMU Enable bit of the MMU control register. 

IE = instruction Cache enable bit of the MMU control register. 

AC = Altemate Cacheable bit of the MMU control register. 


C = Cacheable bit kept in each Page Table Entry. 
X = “don't саге" 







The entries in the table labeled "Not Cached" mark cases where the instruction 
cache is not accessed and instructions read from memory are not entered into 
the instruction cache. Entries marked "Cached" are for cases where instruc- 
tions read from memory are entered into the instruction cache, and the cache 
supplies the instructions, if they are present. 


10.2.3 Instruction Cache Miss Processing 
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Miss processing is caused when a demand fetch fails to match a valid sub- 
blocktag in the cache. Prefetches (see Subsection 10.2.4) never provoke miss 
processing. 


In miss processing, the required sub-block is read in from the next level of the 
memory hierarchy and forwarded tothe instruction queue. Simultaneously, the 
new sub-block is placed in the cache. This may require that a valid block be 
displaced from the cache. Since the instruction cache is five-way set-associa- 
tive, there are five candidates for replacement. The choice among the candi- 
dates is controlled by the replacement policy. 
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instruction Cache Replacement Policy 


A limited-history algorithm determines which blocks in the cache will be re- 
placed. The STag contains a usage bit for each block of the set and a lock bit 
for all but block 0. Whenever an instruction fetch hits in the cache, the usage 
bit is set. If all the other usage bits in the STag are already set, they are all 
Cleared. All five bits are never set at the same time, and the usage bit of the 
most recently used block will always be set. When the usage bits are reset, the 
history begins асситша о again. In this way, a limited record of the mostre- 
cently used members of a set can be kept. 


The maintenance of the usage bits is modified by the lock bits. For the purpose 
of usage bit updates, the usage bit for a locked block is treated as if it were 
always set. 


A block that is locked into the cache is never selected for replacement. Block 
0 cannot be locked; therefore, if all other entries are locked, it will always be 
selected for replacement, regardless of the state of the usage bits. 


If there is more than one unlocked block, the usage bits determine which block 
should be replaced. if there is an unlocked block with a cleared usage bit, it 
is a candidate for replacement. A candidate is selected for replacement 
according to a fixed priority order. Block 4 has the highest priority order, and 
block 0 the lowest. The instruction cache therefore fills starting at block 4, then 
progresses to 3, 2, 1, and finally 0. 


Case One and Case Two of Ехатрје 10-1 illustrate two replacement 
schemes. 
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Example 10—1. Instruction Cache Replacement 





Case One 
USE 
LCK 
Case Two 





USE = Usage - STag bits providing a limited history used bits. 
LCK = Lock - STag bits providing information on locked blocks. 


In Example 10-1, Case One, based on the replacement candidate line, block 
2 is chosen for replacement. In Case Two, block 3 is chosen, since it is the left- 
most available block. Note again that block 0 can never be locked. This is to 
ensure that cacheable data may always be stored in the cache (when the 
cache is enabled). 


Snoop Hits and Lock Bits 
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Cache-consistency transactions use physical addresses, so the PTags are 
consulted for address comparison. A snoop hit occurs when a cacheable bus 
transaction matches the PTag for a block in the cache. A block will be invali- 
dated (valid bits cleared) when there is a snoop hit on VBus or an invalidation 
transaction on MBus. invalidation transactions include coherent read invali- 
date (CRI) (CRI), coherent invalidate (Cl), and coherent write invalidate (CWI). 


If a block is invalidated and the lock bit is set, the lock bit will not be cleared. 
This prevents the address cache block from being used, thus reducing the 
number of active cache blocks by 1. 


peem—————————————————————————— 
Note: 
if a block is to be locked, ensure that it will not be invalidated. 


See Subsection 10.2.5 for further information on cache consistency opera- 
tions in the instruction cache. 
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Instruction Cache Miss Penalties and Timing 


When an instruction cache miss occurs, a number of events occur that can af- 
fect instruction processing timing. First, since the instruction cache does not 
contain the requested instruction, there is no instruction to send to the instruc- 
tion queue. if the request was a demand fetch (not a prefetch), the instruction 
queue will become empty, and bubbles (empty instruction groups) will be is- 
sued into the pipeline until the instruction is available for decoding. Even if the 
fetch request was for a prefetch, it is likely that the instruction queue will be- 
come empty before the miss is serviced from memory. 


Once a miss is detected, the instruction cache arbitrates for the bus interface 
(other requesters might be snoop requests, data cache bus accesses, the 
store buffer, and MMU bus accesses). Once the bus interface is available, ar- 
bitration is required again, this time to acquire access to the bus. The method 
of arbitration depends on the bus in use, either MBus (see Chapter 17) or VBus 
(see Chapter 18). After access to the bus has been obtained, the requested 
sub-block is read. Once the data returns, it is sent to the instruction queue and 
the cache simultaneously. 


When the data for an instruction cache miss is found in the extemal cache 
(VBus configuration), ittakes a minimum of ме cycles more than an instruction 
cache hit. 


10.2.4 Instruction Prefetching and the Instruction Queue 


The instruction prefetcher supplies instructions directly to the Instruction 
Queue (IQ) without buffering. The IQ comprises an eight-word first-in, first-out 
(FIFO) sequential instruction queue and a four-word branch target queue (see 
Figure 10-2). The !Q provides instructions for the pipeline to execute. The 
pipeline can extract up to three instructions per cycle from the IQ. The pre- 
fetcher continuously tries to fill the IQ with the instructions, either from the in- 
struction cache (on a cache hit) or from memory (on a cache miss), according 
to the next sequential address location. When a control transfer instruction 
(CTI) is encountered, the prefetcher immediately retrieves the target instruc- 
tions and places them in the branch target queue. If the CTI is taken, the pipe- 
line will extract instructions from the branch target queue; if the CTI is not tak- 
en, the branch target queue will be ignored, and fetching will continue from the 
sequential instruction queue. 
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Figure 10—2. instruction Queue 
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A demand fetch is a memory access that occurs when the processor needs 
instructions that are not currently available in the IQ. This can occur due to 
changes in the program counter during exceptions, traps, or CTIs. It also oc- 
curs when an instruction is required before it has been delivered to the [Q. 


Only a demand fetch—not a prefetch—can initiate an MMU table walk or 
cause an exception. If a required translation is not present in the translation 
lookaside buffer (TLB), the prefetch logic will abandon the prefetch attempt. 
A demand fetch may be required later for the same address translation, and 
the MMU will perform the table walk at this time. 


Instruction prefetching is always enabled when the instruction cache is on. 
There is no explicit control bit. Instruction prefetching for SuperSPARC works 


identically on VBus and MBus. Errors may occur during prefetching; see Sec- 
tion B.6 for details on prefetch exception handling. 
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10.2.5 Instruction Cache Consistency 


The instruction cache snoops transactions on MBus or VBus and is fully con- 
sistent with the internal data cache and all external caches obeying the cache- 
consistency protocol for that bus. Instruction cache consistency is maintained 
in hardware. 


Cache-consistency transactions use physical addresses, so the PTags are 
consulted for address comparison. A snoop hit occurs when a cacheable bus 
transaction matches the PTag for a block in the cache. A block will be invali- 
dated (valid bits cleared) when there is a snoop hit on VBus or an invalidation 
transaction (i.e., CRI, Cl, CWI) on MBus. 


The instruction cache does not allow writes, so it never becomes an owner of 
data, and it never needs to transmit its contents back to the system bus. All 
instruction cache snoop hits are handled by invalidating the appropriate cache 
entries. This includes snoop hits by transactions generated by load and store 
operations on the same processor. The invalidations are executed as long as 
the MCNTL.SE (snoop enable) bit is set. 


Self-Modifying Code 


The instruction cache cannot be written directly. Instead, the normal cache- 
consistency mechanism is used between the instruction cache and the data 
cache. When a store instruction is attempted to an address that is present in 
the instruction cache, the data cache must obtain ownership of the block be- 
fore the store can proceed. In order to acquire ownership, any copies of the 
block that exist in any of the other caches in the system must be invalidated. 
The instruction cache will see this invalidation due to its snooping of all bus ac- 
tivity, and it will invalidate its copy of the block. Should the instruction cache 
attempt to fetch from the block after the invalidation, the cache-consistency 
protocol forthe bus will ensure that the instruction cache receives the new data 
from the data cache. 


A FLUSH instruction must be performed to ensure instruction cache consis- 
tency when you execute self-moditying code. (See Section 7.4 for more 
information on FLUSH instructions.) The FLUSH instruction forces the execu- 
tion of all the pending writes and will also flush the instruction prefetch buffer 
and pipeline. The FLUSH instruction executes synchronously, implying that 
the SuperSPARC processor is stalled until all previous memory operations are 
completed. The instruction that was modified prior to the FLUSH instruction 
is guaranteed to execute properly after the FLUSH has been completed. 
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10.3 Instruction Cache Diagnostic and Control Interfaces 


This section describes the low-level diagnostic and contro! interfaces to the 
instruction cache. These interfaces are accessed via LDA, LDDA, STA, and 
STDA instructions to an ASI assigned for this purpose. Table 10-3 shows the 
ASIs used for diagnostic and control access to the instruction cache. 


Table 10-3. Instruction Cache ASis 


Instruction Cache Tags 
Instruction Cache Data 





10.3.1 Instruction Cache Flash Clear (ASI=0x36) 


You can invalidate the entire instruction cache or clear all the lock bits by issu- 
ing a store alternate with the ASI value 0x36. The data size of the store opera- 
tion must be a 32-bit word. All other data sizes cause a data access, excep- 
tion. The data issued by the store operation is ignored. The most significant 
bit of the address determines the type of operation. 





Note: 


Flash clear operations should always be executed before enabling the 
instruction cache. 





The address format is shown in Figure 10-3. 
Figure 10-3. Instruction Cache Flash Clear Address Format 


[— ТҮР 
ња |] 
31 30 0 
Bit field explanations: 
TYP Type of operation. 

0: invalidate. Both valid bits in the PTags of each block in 
the instruction cache are cleared; all USE bits are 
cleared in the STags of each set in the instruction cache. 
The Lock bits of the STags are not affected. 

1: Unlock. All Lock bits in the STags of each set in the 
instruction cache are cleared. 

reserved Reserved. These bits are ignored. 
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10.3.2 Instruction Cache Tags (ASI=0x0c) 


Instruction cache tags are readable and writable with ASI value OxOc. This 
direct-access capability is provided for diagnostic purposes. 


A PTag) is associated with each cache block, and the STag is associated with 
each set. The PTags and the STags must be accessed as 64-bit doublewords. 
All other data sizes cause a data access exception. The contents of the 
cache tags are not affected by watchdog reset or hardware reset. 


The tags are addressed as pairs because each instruction cache block has 
both a PTag and an STag. The address format is as follows: 


Figure 10-4. Instruction Cache Tag Address Format 


ТҮР |г | BK |]. | __| res | zo 
31 3029 28 26 25 | 12 11 6 5 32 0 
The fields of data cache tag address are: 
TYP Type of operation. 
0: reserved. Generates a data access exception. 
1: Sag. Access the set tag. BLK is ignored. SET selects 
the set to access. 
2: PTag. Access the physical tag. SET and BLK are used to 
access a particular block's PTag. 
3: reserved. Generates a data access exception. 

г, res Reserved. These bits are ignored. 

BLK Block. Selects one of the five blocks in a set (0-4). Blocks 5 - 7 
do not exist. Attempting to access one of blocks 5-7 generates 
adata access exception. 

SET Set selects one of the instruction cache's 64 sets to access. 

zero Zero field. This field should always be zero. 


PTag Diagnostic Access 


Figure 10-5 shows the instruction cache PTag diagnostic access format. 
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Figure 10—5. Instruction Cache PTag Diagnostic Access Format 


LIN. т. он ____| 


63 58 57 56 55 24 23 0 
Bit field explanations: 
res Reserved. Read as 0 and ignored on а write. 
Мо Valid bit for the odd sub-block. When this bit is cleared, the odd 


sub-blockis invalid. The odd sub-block stores data at addresses 
with PA[5]21. At power-on (see Section 13.2), this bit is unde- 
fined. 


Ve Valid bit for the even sub-block. The even sub-block stores data 
at addresses with РА[5]=0. At power-on (see Section 13.2), this 
bit is undefined. 


PADDR Physical Address Tag. Bits [35:12] of the physical address of the 
data contained in this cache block. Note that if neither Vo or Ve 
is set, the PTag has no meaning. | either or both of Vo and Ve 
is set, the PADDR field is valid. 

STag Diagnostic Access 
Figure 10-6 shows the instruction cache STag diagnostic access format. 


Figure 10-6. Instruction Cache STag Diagnostic Access Format 


[к 
63 13 12 87 54 0 
Bit field explanations: 
res, г Reserved. Read as 0 and ignored on a write. 
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Usage bits. This bit field indicates blocks of the set that have 
been accessed recently. The position of the bit in the field deter- 
mines which block it is associated with. Bits 8-12 correspond to 
blocks 0-4, respectively. This bit field is updated by the hard- 
ware replacement algorithm and is used to select a block for 
replacement. 


[use uses | 02 | user [ и | 
12 11 10 9 8 


Lock bits. Each lock bit can lock a block inside the cache. The 
position of the bit determines which block it locks. Bit 0 is fixed 
to 0 and ignores write. Bits 1-4 lock blocks 1-4, respectively. 
When a lock bit is set to 1, the corresponding block will not be 
displaced by the replacement algorithm. 


[ima | 58 | шока | i ] 5 — 
4 3 2 1 0 


10.3.3 Instruction Cache Data (ASI=0x0d) 


Instruction cache data is readable and writable with ASI value OxOd. This 
direct-access capability is provided for diagnostic purposes. The blocks must 
be accessed as 64-bit doublewords. Other data sizes cause a data ac- 
cess exception. Instruction cache data is not affected by watchdog reset, 
hardware reset, or flash clear. The address format is shown in Figure 10-7. 


Figure 10-7. Instruction Cache Data Address Format 


Ew [RE] "m ac ай _: дна... 1 t 
5 


31 29 28 26 25 


12 11 6 


The fields of an instruction cache data address are: 


res 
BLK 


Reserved. These bits are ignored. 


Block. Designates which of the instruction cache's five blocks 
(0-4) are referenced. Blocks 5 - 7 do not exist. Attempting to ac- 
cess one of blocks 5-7 generates a data, access exception. 


Select. Selects one of 64 sets of the cache. 
Doubleword. Selects the doubleword within the cache block. 
Zero field. This field should always be zero. 
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10.4 Data Cache 


Data Cache Tow пи с as PEE ита = 


The SuperSPARC processor has an internal data cache that features: 
Q 16K-byte total capacity. 

(а Four-way set-associative organization with128 sets. 

(а Tags containing physical addresses. 

С] 32-byte block size. 

[] No sub-blocks. 

О Up to 64-bit data access. 


The data cache organization is shown in Figure 10-8. The cache is four-way 
set associative with 128 sets. Physical address bits PA[11:5] select one of the 
128 sets. Each set stores four blocks. Each set has an STag that contains the 
usage and lock information for the four blocks in the set (see Figure 10-8). 


Figure 10-8. Data Cache Organization 
Cache Block = 32 Bytes 





1 
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Each of the four blocks in a set has a PTag and a block of 32 bytes that holds 
cached data. The PTag contains the upper 24 bits of the physical address of 
the data currently contained in the block, if any, and a valid bit that indicates 
whether the block contains any valid data. The PTag also has shared (S) and 
dirty (D) bits that are used by the cache-consistency protocols. 
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The cache is enabled by the MCNTL.DE bit (see Subsection 9.12.1). The 
cacheis disabled at power-on reset and remains disabled unti! the MCNTL.DE 
bit is set. 


At power-on, the contents of the data cache are undefined. It is the responsibil- 
ity of the software to initialize the data cache by using data cache flash clear 
to reset the valid bits . After a watchdog reset, the contents of the data cache 
are unmodified. 

10.4.1 Data Cache Access 
The steps in a data cache access are: 


1) The virtual address is translated by the MMU to produce a physical 
address, PA[35:0]. А даѓіа access, exception may be raised by the MMU. 


2) PA[11:5] selects one of the 128 sets. 


3) The four blocktags (PTags) in the set are compared to PA[35:12], ignoring 
any invalid blocks. 


4) Ifthereis amatching block, itis selected. Otherwise, the access is a cache 
miss, and the block is read from memory. 


5) PA[4:0] and the size of the transaction determine which bytes are 
retumed, as shown in Table 10-4. 
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Table 10—4. Data Bytes of the Data Cache Block Returned from Access 
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% marks illegal address alignment. These combinations of address and access size provoke а 
memory address not aligned exception. 

n numbers indicate the byte within the cache block. 

n:n number ranges indicate the range of bytes within the cache block. 
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10.4.2 Write-Through or Copy-Back Cache 


The MCNTL.MB bit is a read-only indicator of whether the SuperSPARC pro- 
cessor is operating directly on the MBus or whether it is connected to the 
MXCC over VBus. This bit reflects the state of the CCMODE pin atreset. When 
SuperSPARC is used with the MXCC, the data cache operates as a write- 
through cache. When CCMODE is low atreset, the processor uses VBus, and 
the data cache operates as a write-through cache. When the processor is op- 
erating directly on the MBus with no MXCC, the data cache operates as a co- 
py-back cache with write allocation. 


When the processor is used on the VBus, the data cache is operated as a 
write-through cache, and all stores are immediately written to the store buffer. 
The store buffer will send them out to the extemal cache and main memory as 
soon as resources are available. Writes do not allocate a block in the data 
cache if it is not already present. 


When the processor operates directly on the MBus, cache blocks that have 
been modified by the processor are written back to memory only when neces- 
sary because of replacement or snoop reads. The data cache operates with 
write-allocation: when a store miss occurs, the data cache will allocate a block, 
bring in the missing data from memory, and write that data into the cache block. 


10.4.3 Cacheability 


Data is cached depending on the mode set within the MMU control register 
(MCNTL) and cacheable bits within each PTE, as shown in Table 10-5. 


Table 10—5.Data Cacheability from Loads and Stores 


Translation Mode | MCNTL.DE=0 MCNTL.DE-1 MCNTL.DE-1 
MCNTL.AC=X MCNTL.AC=0 MCNTL.AC=1 


МАО Transparent Not Cac 


MMU Disabled Not Cached Not Cached Cached 
MCNTL. ВТ=Х 
MCNTL. EN=0 















MMU Enabled Not Cached C1: Cached C«1: Cached 


MCNTL.BT = X С=0: Not Cached | С=0: Not Cached 
MCNTL.EN x 1 


BT » Boot Mode bit of the MMU control register. 

EN « MMU Enable bit of the MMU control register. 

DE = is the Data Cache enable bit of the MMU control register. 
AC = Altemate Cacheabie bit of the MMU control register. 

С = СаспеаЫе bit kept in each Page Table Entry. 

X « *don't care". 
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The entries in the table labeled "Not Cached" mark cases where the data 
cache is not accessed and data read from memory is not entered into the 
cache. Entries marked "Cached" are for cases where the data cache supplies 
the data if itis present in the cache and, if not present, data read from memory 
is entered into the cache. 


Generally, references are non-cacheable when the data cache is disabled. 
When the cache is enabled, the C bit from the PTE controls cacheability. For 
special accesses (such as MMU pass-through/bypass transactions) without 
a corresponding C bit, the AC (alternate cacheable) bit is used to determine 
cacheability in the MCNTL register. 


The cacheability of data in the internal data cache also controls cacheability 
in the external cache. For MMU table walk references, the MCNTL.TC (table 
walk cacheable) bit indicates external cacheability, even though table walk 
data is never cached intemally. 


10.4.4 Data Cache Miss Processing 


A data cache miss occurs when PA[35:12] does not match the PADDR field 
of the PTag for any ofthe four blocks in the set selected by PA[11:5]; the match- 
ing PTag must also be valid (have V set). Miss processing is the series of ac- 
tions triggered by a cache miss. 


To process a data cache miss, the required block is read in from the next level 
of the memory hierarchy and placed in the cache. This may require that a valid 
block be displaced from the cache. Since the data cache is four-way set- 
associative, there are four candidates for replacement. The choice among the 
candidates is controlled by the replacement policy. 


Data Cache Replacement Policy 
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A limited-history algorithm determines which blocks in the cache will be re- 
placed. The STag contains a usage bit for each block of the set and a lock bit 
for all but block 0. Whenever a data reference hits in the cache, the usage bit 
is set. If all the other usage bits in the STag are already set, they are all cleared. 
All four bits are never set at the same time, and the usage bit of the most 
recently used block will always be set. When the usage bits are reset, the histo- 
ry begins accumulating again. In this way, a limited record of the most recently 
used members of a set can be kept. 


The maintenance of the usage bits is modified by the lock bits. For the purpose 
of usage bit updates, the usage bit for a locked block is treated as if it were 
always set. 
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А block that is locked into the cache is never selected for replacement. Block 
0 cannot be locked; therefore, if all other entries are locked, it will always be 
selected for replacement, regardless of the state of the usage bits. 


If there is more than one unlocked block, the usage bits determine which block 
should be replaced. If there is an unlocked block with a cleared usage bit, it 
is a candidate for replacement. A candidate is selected for replacement 
according to a fixed priority order. Block 3 has the highest priority order, and 
block 0 the lowest. The data cache therefore starts filling at block 3, then prog- 
resses to 2, 1, and finally 0. 


Cases One and Two of Example 10-2 illustrate two replacement schemes. 


Example 10-2. Data Cache Replacement 
Case One 


Case Two 








USE 
LCK 
[Replacement Candidate | Ves [Yos [№ | No 


USE = Usage - STags providing a limited history used bits. 
LCK «Lock - STags providing locked blocks information. 


In Example 10-2, Case One, based on the replacement candidate line, block 
2 is chosen for replacement. In Case Two, block3 is chosen, since it is the left- 
most available block. Note again that block 0 can never be locked. This is to 
ensure that cacheable data may always be stored in the cache (when the 
cache is enabled). 
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Snoop Hits and Lock Bits 


Cache-consistency transactions use physical addresses, so the PTags are 
consulted for address comparison. A snoop hit occurs when a cacheable bus 
transaction matches the PTag for a valid block in the cache. A block will be 
invalidated (valid bits cleared) whenthere is a snoophit on VBus or an invalida- 
tion transaction (i.e., CRI, Cl, CWI) on MBus. 


If a block is invalidated and the lock bit is set, the lock bit will not be cleared. 
This prevents the cache block from being used, thus reducing the number of 
active cache blocks by 1. 


a 


Note: 
If a block is to be locked, ensure that it will not be invalidated. 


See Subsection 10.4.6 for further information on cache-consistency opera- 
tions in the data cache. 


10.4.5 Data Cache Miss Timing 


The timing for a data cache miss depends on the bus in use and whether a 
block in the data cache needs to be replaced. The following summarizes sev- 
eral of the common cases. None of the timings given accounts for any delays 
in obtaining access to the SSP's bus interface or to the bus itself, so all should 
be considered minimum delays. 


VBus with External Cache 


10-24 


Following is the sequence for a data cache miss in the VBus with extemal 
cache configuration. No block replacement is needed because intemal data 
cache blocks are never dirty. 


1) LDinstruction accesses TLB and data cache. The cache miss is detected. 
This cycle is where the data would be accessed for a cache hit. 


2) LD instruction frozen in EO. This is the first extra clock cycle. 


3) VBus command word cycle. The SSP issues a 32-byte read on VBus. The 
extemal cache SRAMs are pipelined. Like MXCC, they latch the address 
on this cycle. 


4) The extemal-cache SRAMs are pipelined. SRAMs work internally during 
this cycle, using the address latched from the previous cycle. MXCC also 
determines external cache hit or miss during this cycle. 
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5) The external cache SRAMS return the first doubleword in this cycle. 


6) In the following cycle, the data cache writes the first doubleword of data 
into the cache and also sends the data to the LD instruction, which com- 
pletes with its WB stage. 


So, when a load misses in the intemal cache, there is а five-cycle stall for data 
from the second-level cache. Stores do not write-aliocate when extemal cache 
is used and proceed immediately to the store buffer without any stalls. 


The next three doublewords of data are retumed after the LD instruction has 
been satisfied and while the pipeline proceeds with other instructions. The 
additional doublewords of the cache block keep the VBus busy for three addi- 
tional cycles. 


See Chapter 18 for more information on bus cycles in this configuration. 


Direct MBus When No Block Heplacement Needed 


Foradata cache miss in the direct MBus configuration when no block replace- 
ment is needed (the block is not dirty), the sequence is: 


1) LD instruction accesses TLB and data cache. The cache miss is detected. 
This cycle is where the data would be accessed for a cache hit. 


2) LD instruction frozen in EO. This is the first extra clock cycle. 


3) MBus command word cycle. The SSP issues a 32-byte coherent read on 
MBus. 


4) The memory retums the first doubleword. The number of cycles after the 
command word cycle depends on the system design and memory speed. 


5) in the following cycle, the data cache writes the first doubleword of data 
into the cache and also sends the data to the LD instruction, which com- 
pletes with its WB stage. 


So the miss penalty for MBus when no replacement is needed is three cycles 
plus the memory system's command word to first data delay. 


The next three doublewords of data are retumed after the LD instruction has 
been satisfied and while the pipeline proceeds with other instructions. The 


additional doublewords of the cache block keep the MBus busy for additional 
cycles. 
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MBus With Block Replacement 


For a data cache miss in the direct MBus configuration when block replace- 
ment is needed (the block is marked dirty), the sequence is: 


1) LDinstruction accesses TLB and data cache. The cache miss is detected. 
This cycle is where the data would be accessed for a cache hit. 


2) LDinstruction frozen in EO. This is the first extra clock cycle. In this cycle, 
the SSP also issues a command word cycle on MBus to start a 32-byte 
write. 


3) In the next four cycles, the SSP sends the four doublewords of the re- 
placed block to memory via MBus. 


4) The SSP may have to wait for all doublewords to be acknowledged on 
MBus. 


5) MBus is idle for one cycle after the last doubleword of the 32-byte write is 
acknowledged. 


6) The SSP issues a command word cycle on MBus for a 32-byte coherent 
read. 


7) The memory retums the first doubleword. The number of cycles after the 
command word cycle is dependent on the system design and memory 
speed. . 


B) In the following cycle, the data cache writes the first doubleword of data 
into the cache and also sends the data to the LD instruction, which com- 
pletes with its WB stage. 


The miss penalty for MBus when replacement is needed is eight cycles plus 
the memory system's command word to first data delay and any delays that 
the memory controller inserts into the burst write. 


The next three doublewords of data are returned after the LD instruction has 
been satisfied and while the pipeline proceeds with other instructions. The 
additional doublewords of the cache block keep the MBus busy for additional 
cycles. 


10.4.6 Data Cache Consistency 
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In a multiple-processor system, a mechanism must exist to keep local caches 
consistent with each other and with main memory. The SuperSPARC proces- 
sor uses a protocol for this purpose that is implemented in hardware. Part of 
this protocol involves snooping bus transactions. The purpose of snooping is 
to ensure that the contents of the data cache are consistent with external 
caches and main memory. Both the data cache and the store buffer snoop for 
incoming transactions that request data invalidation. 
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All addresses that are snooped are compared with the cache tags in the in- 
struction cache and the data cache. Cache-consistency transactions use 
physical addresses, so the PTags are consulted for address comparison. A 
snoop hit occurs if the address presented on the bus matches a valid PTag in 
the cache set to which the address belongs. Cache snooping is controlled by 
MONTL.SE-—snoop-enable. 


The action taken on a snoop hit depends on whether the SSP is being used 
with the MXCC on VBus oris connected directly on MBus. When SuperSPARC 
is used with the МХСС, preserving this consistency is fairly simple; an entry 
in the cache is invalidated if some other processor writes to this location. Chap- 
ter 18 explains the VBus protocol. 


For SuperSPARC on MBus, the actions taken on a snoop hit are more complex 
and may include responding to the bus transaction with the current cache con- 
tents. The data cache responses on extemal transactions are listed in 
Table 10-6. Chapter 17 has a more complete description of the MBus cache- 
consistency protocol. 


Table 10-6. Data Cache Snoop Mechanism (MBus) 











= Being Snooped т idi 
CR GR GL Ow 






nt 
он, CL OW 
ома 


Hit, owned, shared Copy out data, Invalidate entry 
Ci Owl invalidate entry 


R= Read(non coherent) W=Write (non-coherent) 
CR=Coherent Read Cl=Coherent Invalidate 
CRIzCoherent Read and invalidate CWl=Coherent Write and Invalidate 


(Refer to the SPARC MBus Specification for definitions of these terms). 
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Table 10-6 shows the responses of the data cache to various MBus transac- 
tions, depending on the state of the block in the cache that is accessed. This 
status is held in the cache tag in the shared and dirty bits. 


д 


Note: 


The dirty bit being set indicates ownership; dirty bit is another name for 
owned bit. 





The shared bit is set in the cache tag if the data is also present in another pro- 
cessor's cache. The dirty bit is set if the data has been modified by the proces- 
sor and the changes have not yet been written to memory. The dirty bit will be 
set only if the processor is operating directly on the MBus. 


Consistency with MMU Accesses 


Accesses initiated by the MMU, either reads for table walks or writes for R&M 
updates, access the bus directly (they never access intemal caches). When 
software accesses page tables in memory, the accesses might be cacheable. 
This can lead to problems because the MMU will not access the copy cached 
in the data cache. Software must be sure to access the page tables appropri- 
ately. The options for proper operation are different between direct MBus and 
VBus configurations. 


In direct MBus configurations, reads and writes initiated by the MMU do not 
snoop the data cache. Thus page tables must be accessed through non- 
cacheable memory space to ensure consistency. There is no extemal cache, 
so MCNTL.TC must be deasserted. 


The write-through nature of the data cache on VBus keeps the data cache and 
the external cache consistent. Thus the software may make cacheable access 
to the page tables, and the MMU will access the entries in external cache if 
MCNTL.TCis set. Alternatively, software may access the tables as non-cache- 
able data, and the MMU will access the entries in main memory if MCNTL.TC 
is dear. 


Consistency with Instruction Accesses 
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See Subsection 10.2.5. 
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10.5 Data Cache Diagnostic and Control Interfaces 


Data Cache Diagnostic and Control Interfaces 


This section describes the low-level diagnostic and contro! interfaces to the 
data cache. These interfaces are accessed via LDA and STA instructions to 
ASis assigned for this purpose. Table 10-7 shows the ASIs used for diagnostic 


and control access to the data cache. 


Table 10—7.Data Cache ASis 










[Function [ай 
[ata Cache ав [oor | 





10.5.1 Data Cache Flash Clear (ASIZ0x37) 


You can invalidate the entire data cache or clear all the lock bits by issuing a 
store alternate with the ASI value 0x37. The data size of the store operation 
must be a 32-bit word. All other data sizes cause a data access, exception. 
The data issued by the store operation is ignored. The most significant bit of 


the address determines the type of operation. 





Note: 


Flash clear operations should always be executed before enabling the data 


cache. 





The address format is shown in Figure 10-9. 
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Figure 10—9. Data Cache Flash Clear Address Format 


= ТҮР 
аа E 
31 30 0 


TYP Type of operation. 


0: invalidate. The valid bit in each of the four PTags of each 
set in the data cache is cleared. The other fields of the 
PTag (D, S, and PADDR) are not affected; all USE bits 
are cleared in the STags of each set in the data cache. 
The LCK bits of the STags are not affected. 


1: Unlock. All LCK (lock) bits in the STag of each set in the 
data cache are cleared. The USE bits of the STag are not 
atfected. 


reserved Reserved. These bits are ignored on write and read as zero. 


10.5.2 Data Cache Tags (ASI=0x0e) 


Datacache tags arereadable and writable with ASI value OxOe. This direct-ac- 
cess capability is provided for diagnostic purposes. 


A PTag is associated with each cache block, and the STag is associated with 
each set. The PTags and the STags are accessed as 64-bit doublewords. АЙ 
other data sizes cause a data access exception. The contents of the cache 
tags are not affected by watchdog reset or hardware reset. 


To access all of the tag information for a block, both the PTag and STag must 
be accessed. The data cache tag address format is shown in Figure 10-10. 
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Figure 10-10. Data Cache Tag Address Format 


| ТУР | тез (BLOCK | 6 [| SET — res [ zero | 
31 3029 28 27 26 25 12 11 54 32 0 
The fields of data cache tag address are: 
TYP Type of operation. 


0: reserved. Generates a data access, exception. 
1: STag. Accesses the set tag. BLK is ignored. SET selects 


the set to access. 
2: PTag. Accesses the physical tag. SET апа BLK are used 
to access a particular block's PTag. 
3: reserved. Generates a data access exception. 
res Reserved. These bits are ignored. 
BLOCK Block. Selects one of the four blocks (0-3) in a set. 
SET Set selects one of the data cache's 128 sets to access. 
zero Zero field. This field should always be zero. 


PTag Diagnostic Access 
Figure 10-11 shows the data cache PTag format. 


Figure 10–11. Паја Cache PTag Format 


| rs ДУ res | vs j|S| эъ A 


57 56 55 49 48 47 41 40 39 24 23 0 
The fields of data cache PTag are: 
res Reserved. Read as 0 and ignored on a write. 
У Valid bit for the block. This bit indicates that the block and its as- 


sociated tag are valid. When this bit is cleared, the correspond- 
ing block is invalid, and none of the other fields of the PTag has 
any meaning. At power-on valid bits are undefined. 
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D Dirty. When the dirty bit is set, it indicates that the block has been 
modified by one or more writes. A dirty bit can never be setin nor- 
mal operation in VBus configurations; in the direct MBus config- 
uration, however, it is set whenever a block is modified in the 
cache. 


In the direct MBus configuration, the data cache is in write-back 
mode. If the dirty bit is set, the block is copied back into main 
memory if it must be replaced. See Subsection 10.4.6 for more 
information. 


S Shared. When the shared bit is set, it indicates that the block is 
“shared” between this cache and another cache. Writes to a 
shared block need special treatment to ensure cache consisten- 
cy. 


PADDR Physical address tag. Bits [35:12] of the physical address of the 
data contained in this cache block. 


STag Diagnostic Access 
Figure 10-12 shows the data cache STag format. 


Figure 10-12. Data Cache STag Format 


TE рки 


63 1211 87 43 0 
The fields of the data cache set tag are: 
res,r Reserved. Read as 0 and ignored on a write. 


10-32 Caches/Store Buffer 


Subject to Change Without Notice 





USE 


LCK 
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Used. This bit field indicates which blocks of the set have been 
accessed recently. The position of the bit in the field determines 
the block it is associated with. Bits 8 -11 correspond to blocks 
0-3, respectively. This bit field is updated by the hardware 
replacement algorithm and is used to select a block for replace- 
ment. 


| Uses | ш | и | USED | 


11 10 9 8 


Lock bits. Each lock bit can lock a block inside the cache. The 
position of the bit determines which block it locks. Bit 0 is fixed 
to 0 and ignores writes. Bits 1-3 lock blocks 1-3, respectively. 
When а lock bit is set to 1, the corresponding block will not be 


displaced by the replacement algorithm. 
ока се [ i [ 9 | 
3 2 1 0 


10.5.3 Data Cache Data (ASI=0x0f) 


Data cache data is readable and writable with ASI value OxOf. This direct-ac- 
cess capability is provided for diagnostic purposes. The blocks may only be 
accessed as 64-5 doublewords. Other data sizes provoke а 
data_access_exception. Data cache data is not affected by watchdog reset, 
hardware reset, or flash clear. The address format is shown in Figure 10-13. 


Figure 10-13. Data Cache Data Address Format 


Тао — — — m SET [WORDT zem | 


31 2726 25 24 


12 11 54 32 0 


The fields of data cache data address are: 


Reserved. These bits are ignored. 

Block. Selects one of the four blocks (0-3) in a set. 

Set. Selects one of the 128 sets of the cache. 

Doubleword. Selects the doubleword within the cache block. 
Zero field. Must be zero. 
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10.6 Store Buffer 


The SSP's store buffer is a fully-associative cache of eight doubleword entries. 
This buffer functions to eliminate most of the performance penalties asso- 
ciated with writes in the write-through cache configurations (VBus) and block 
replacement in copy-back configurations. Its depth is sufficient to hold 16 
SPARC registers, the number required to save а register window to memory. 


10.6.1 General Operation 
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A store buffer entry records the state of a pending store operation. The entry 
contains the address of the store and the data to be stored. Other important 
information related to the store operation is also recorded in the store buffer. 
Each store instruction requires a single store buffer entry, whether its size is 
byte, halfword, word, or doubleword. A copy-back (replacement of a dirty 
cache block) requires four store buffer entries for the 32 bytes of the cache 
block. 


The store buffer is first-in first-out (FIFO). The order of stores in the buffer is 
never altered. The oldest store in the buffer must be retired before the next old- 
est store is presented to the bus. 


A store buffer entry is retired from the buffer when it is acknowledged on the 
bus. In many systems, this will mean that the write has been completed at the ' 
main memory and all snooping caches. In some systems, the bus acknowl- 
edgement does not indicate completion. In those systems, the SuperSPARC 
processor relies on the PEND signal from the system to correctly implement 
the memory model. (See Section 8.7.) 


The store buffer is a flushing-type buffer. If the doubleword address of a read 
transaction matches the doubleword address of any write transaction currently 
inthe buffer, the read will wait until that write has completed before continuing. 
No data is returned from the store buffer to the instruction pipeline. This type 
of match will force the buffer to flush to memory as quickly as possible. 


When the buffer is full, new store operations will cause the pipeline to stall until 
a single entry in the buffer is available. The buffer is not forced to flush in this 
situation. 

The store buffer tums sequences of memory references within a cache block 
into burst write operations on the bus (only in VBus configurations). The burst 


writes will continue as long as the successive writes in the buffer continue to 
be within the same cache block. 


The store buffer components (tags, data, and control) are accessible via ASI 
spaces 0x30-0x32 for diagnostic purposes. 
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10.6.2 Store Buffer Operation with the MultiCache Controller 


The store buffer is primarily used when SuperSPARC is connected to the 
MXCC, where all store operations are completed immediately into the store 
buffer. In this configuration, a store operation will never wait for completion un- 
less the buffer is full or disabled or a TLB miss operation occurs. (See Subsec- 
tion 10.6.4 for a more complete description.) The state of the data cache (hit, 
miss, or disabled) does not affect store buffer operation when SuperSPARC 
is used with the MXCC. 


10.6.3 Store Buffer Operation directly on MBus 


When the SuperSPARC processor is connected directly to the MBus (with no 
МХСО), only non-cacheable stores and copy-back data (dirty cache blocks 
that are replaced) go into the store buffer. This is primarily due to the 
copy-back, write-allocate caching policy used on the MBus. 

10.6.4 Non-Buffered (Synchronous) Operations 


When SuperSPARC is used with the MXCC, there are several cases in which 
stores cannot or should not be buffered. These cases include: 


Q Atomic operations, 

O Certain store altemates (STA), 

О MMU РЕМ updates, and 

О Operations when the store buffer is disabled. 


All of these actions block the execution pipeline until they have completed. 
Since external stores must be performed in order, each of these conditions 
forces all entries in the store buffer to be written to memory before the synchro- 
nous operation begins. 

Table 10-8 shows the ASis that cause the store buffer to flush all pending 


Stores before any store to the ASI-space can proceed. Accesses to other ASIs 
do not flush the store buffer. 
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Table 10—8. Synchronous А515 


MMU Probe 

MMU Registers 

MMU TLB Diagnostic 
Instruction Cache Tags 
Instruction Cache Data 
Data Cache Tags 

Data Cache Data 

Store Buffer Tags 

Store Buffer Data 

Store Buffer Control 
Instruction Cache Flash Clear 
Data Cache Flash Clear 
MMU Breakpoint Diagnostic 
BIST Diagnostic 





10.6.5 Data Store Errors 
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When an exception occurs on a buffered write, a data store error is genera- 
ted. Unlike most other trap types, when the data store error is reported, the 
store instruction that originated the operation has already completed. The trap 
PC for adata store error is generally unrelated to the instruction that started 
the trap. Furthermore, subsequent instructions may also have completed, and 
the register contents from which the original store data and address were com- 
puted may have been lost. 


The store buffer contains enough information, however, to perform the store. 
The trap handler for data store error can inspect the contents of the store 
buffer using diagnostic accesses. The contents of the buffer may be used to 
restart the write that failed. 


Data store errors take priority over exceptions other than reset. In order for 
adata store error to be reported, the PSR.ET (enable traps) bit must be set, 
and the MCNTL.NF (no-fault) bit must be cleared. 
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External Indications 


Trap Handler 


During a store buffer exception, the SuperSPARC processor, when used with 
the MXCC, will assert ERROR for one cycle. When SuperSPARC is directly 
on the MBus, it will assert AERR until MFSR.SB is cleared (for example, by a 
read). 


Дт 


Note: 


The ERROR or AERR signal is also used to indicate error mode and cannot 
be unambiguously interpreted as indicating a store buffer error. 





When an error occurs, the store buffer is automatically disabled. A store buffer 
flush is not initiated, and no other writes will be issued from the store buffer. 
All buffer entries, including the faulting one, are retained in the buffer. All sub- 
sequent writes (nominally in the trap handler) are synchronous, bypassing the 
buffer. This will continue until the store buffer is re-enabled. 


Once in the trap handler, the faulting write can be retried. This is accomplished 
by loading the faulting address and data directly from the store buffer into reg- 
isters (using the diagnostic ASI access to the buffer's address and data infor- 
mation directly). A store altemate through the MMU pass-through ASIs can 
then be used to perform an untranslated store to this physical address. In this 
case, tho MCNTL.AC (alternate cacheable) bit will be used to determine ca- 
cheability. The AC bit should be set to the same value as the C bit field of the 
store buffertag register (since this is the state of the C bitin the original transac- 
tion). 


The data store error handler code should set the MCNTL.NF (no-fault) bit be- 
fore attempting to retry the operation. After the STA to retry the operation, the 
MFSR error bits should be checked explicitly to determine whether the opera- 
tion was successful (see Subsection 9.12.3.7). If the MCNTL.NF bit is not set, 
an error on the retry of these transactions can cause entry into error mode. 


Ifthis retry fails, system software must decide how to continue recovery efforts. 
In VBus configurations, the error is from a store operation in the current context 
(since context changes always force a store buffer copy-out, pending stores 
from another context cannot be present). Once the faulting process is identi- 
fied, the process can be interrupted or killed, rather than the entire system 
stopped. 
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Note: 


For SuperSPARC on MBus (with no MXCC), a data store error resulting from 
a copy-back operation ís not guaranteed to be from the current context. In 
this case, isolating faulting accesses to the process that caused them is more 
difficult. This situation may still be recoverable, but recovery may be very 
complex. 





Memory Order in the Presence of Errors and Recovery 


Standard ordering of memory transactions can be maintained even in the 
presence of store buffer errors and recovery operations. As long as the recov- 
ery routines retry store buffer operations in the order they were requested, no 
ordering is changed. Any memory operations within the trap handler can be 
considered to have executed "before" these buffered writes are performed. 


Recovery by Reenabling the Store Buffer 


When a store buffer exception (data store error) occurs, the store buffer 
pointers retain their present value, as if the store were given a retry acknowl- 
edgement by the bus. Thus, the trap handler can also re-issue the faulted store 
by simply re-enabling the store buffer. This works for both cacheable and non- 
cacheable stores. 


Setting MCNTL.NF will prevent another store buffer trap from being taken if the 
store sees another exception. In this case, the store buffer will still turn off, even 
when MCNTL МР is set. However, the store buffer error remains pending, and 
the integer unit pipeline does not see the error until MCNTL.NF is clear. The 
SBCNTL.ER (error pending) field in the store buffer control register (see Sub- 
section 10.7.3) indicates the presence of a pending (unacknowledged) store 
buffer error. The trap handler might also check if the store buffer is still enabled 
after the store is retried. 


Accessing Store Buffer State for Error Recovery 
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When a store buffer exception (data_store_error) occurs, the SuperSPARC 
processor retains information for all pending stores, including the store which 
encountered the exception, in the store buffer. The following can be accessed 
in several ASI control spaces: 


[] Store buffer control (ASI 0x32), 
Г) Store buffer tag (ASI 0x30), 

Q Store buffer data (ASI 0x31), and 
Г] MMU.SB (ASI 0x04). 


Caches/Store Buffer 


Subject to Change Without Notice 


Store Buffer 





These store buffer entries assist recovery from a store buffer exception. The 
SBCNTL.DPTR (drain pointer—see Subsection 10.7.3) is left intact after an 
exceptionto allow software recovery. In orderto re-enable the store buffer after 
adata store error was taken, the SBCNTL.DPTR must be initialized to allow 
proper execution. 


10.6.6 Disabled Operation—Strong Ordering 


The Store Buffer is disabled when SuperSPARC is first powered up. As a re- 
Sult, ali stores are synchronous and block the execution pipeline until they 
complete. During normal operation, disabling the store buffer allows applica- 
tions to be run with strong sequential ordering of memory references. 


Since disabling the store buffer requires a store to a synchronous ASI, the 
store buffer will always flush before it can be disabled via store buffer control. 


In most system configurations, strong ordering will substantially slow proces- 
sing speed. See Chapter 8 for more information on Strong Ordering and the 
other memory models. 
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10.7 Store Buffer Diagnostic and Control Interfaces 


This section describes low-level diagnostic and control interfaces to the store 
buffer. These interfaces are accessed via LDA, LDDA, STA, and STDA instruc- 
tions to a ASIs assigned for this purpose. Table 10-9 shows the А515 used for 
diagnostic and control access to the store buffer. 


Table 10—9.Data Cache ASis 












Fein [A] 





10.7.1 Store Buffer Tags (ASI- 0x30 ) 


This ASI performs reads and writes, for diagnostic and error-recovery pur- 
poses, to the store buffer's physical tags. 


The address format is shown in Figure 10-14. 
Figure 10—14. Store Buffer Tag Address Format 


|________________ свем A E 
31 65 32 0 
The би fields are: 


reserved Reserved. These bits are ignored. 


ENTRY Entry number. This field selects one of the eight entries of the 
store buffer to access. 


zero Field of zeros. Must be zero. 


Accessing the store buffer tags with an access size other than a doubleword 
will result in a data, access, exception. 


The address selects the tags to access. Tags are read from the selected entry 
with an LDDA instruction or written to the selected entry with a STDA instruc- 
tion. 


The format of a store buffer tag is shown in Figure 10-15. 
Figure 10—15. Store Buffer Tag Format 


[— mmm S Sm СЕ 


» 43 42 41 40 39 38 37 0 
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reserved 


C 


SIZE 


Table 10—10. Transaction Table 


Address 
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These bits are read as Os and ignored for writes. 


Store Barrier Bit. A STBAR instruction sets this bit in the most 
recent entry in the store buffer, if there are any entries. Newly al- 
located store buffer entries have this bit clear. Once set, SP re- 
mains setin an entry until the entry is retired. See Section 8.7 for 
more information. 


Burst Mode Access. This bit, if set, indicates that the next entry 
in the store buffer corresponds to the next consecutive address 
and can thus be issued in burst mode on VBus. This bitis cleared 
in newly allocated entries. 


Valid Bit. If set, it indicates that the entry is valid. This bit is 
cleared when the entry is retired. 


Supervisor Bit. If set, it indicates that the entry is for a store is- 
sued when PSR.S was set and so the entry belongs to a supervi- 
sor process. 


Cacheable Bit. If set, it indicates that the entry is a cacheable ас- 
cess. See Table 10-5 for data cacheability. 


Size of transaction encoded according to Table 10-10. 






[Size | Data Quanti | 
[00 | Byte | 
Lr | Doubiewori | 
Address for the Store Buffer Entry. This address must be correct- 
ly aligned for the size of the transaction (see Table 10-4). For in- 
stance, the lower three bits of the address of a doubleword store 


buffer entry must be 0; if not, a memory_address_not_aligned 
exception is generated. 







10.7.2 Store Buffer Data (ASI=0x31) 


This ASI is used to perform reads and writes to 64-bit data stored in store buffer 
entries. Data entries are addressed in the format shown in Figure 10-16, 
which is the same as the store buffer tag address format. 
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Figure 10—16. Store Buffer Data Address Format 


reserved ENTRY [ zer | 


31 5 2 0 
reserved Ignored. Should be zero. 


ENTRY Entry number. One of the eight entries of the store buffer are se- 
lected by the entry field. 


zero Zero field. Must be zero. 
The store buffer data must be accessed as doublewords. Other data access 
sizes provoke a data access, exception. 

10.7.3 Store Buffer Control Register (SBCNTL) (ASI=0x32) 


Thestore buffer control register (SBCNTL) containing the fill and drain pointers 
is accessible with the ASI value 0x32. SBCNTL must be accessed as a single- 
word. See Figure 10-17. All other sizes of access will generate a data, ac- 
cess exception. The address field is not used for this access, and any address 
can be used. 


Figure 10-17. Store Buffer Control Register Format 
|_____________гозегуед тар ом |  FPTR | 
31 8 7 6 5 2 0 


reserved These reserved bits read as Os and are ignored on writes. 


se Store Buffer Enabled. This bit is read-only and indicates whether 
the store buffer is enabled (1) or disabled (0). This is a shadow 
copy of the MONTL.SB bit and is provided for convenience. 


em Store Buffer Empty. This bit is read-only. SBCNTL.EM indicates 
whether the store buffer is empty (1) or non-empty (0). It is set 
at reset. 

er Store Buffer Error Pending. This bit is read-only and indicates 


whether an untaken store buffer error is pending. This bit is set 
when a store buffer exception occurs while traps are disabled 
(PSR.ET=0). This bit is cleared by taking the store buffer excep- 
tion trap, which occurs automatically when traps are re-enabled. 
The bit is also cleared on reset. 
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DPTR Drain Pointer. The drain pointer indicates the first entry of the 
store buffer that would perform a bus transaction. 


FPTR Fill Pointer. The fill pointer indicates the first entry of the store 
buffer where a new store request can be written. If it is equal to 
the drain pointer, the store buffer is full. 


The number of pending stores is determined by subtracting the drain pointer 
trom the fill pointer (modulo 8). When the two pointers are equal and there are 
valid entries, the store buffer is full and cannot accept any more entries. The 
buffer is empty when the two pointers are equal, but there are no valid entries. 
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The SuperSPARC Floating Point Unit (FPU) operates т accordance with The 
SPARC Architecture Manual. This chapter details some of the FPU's opera- 
tions, concentrating on the floating-point-queue interface, special numeric 
cases, and floating-point exceptions. 


Topic 
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11.1 Floating-Point Instructions 


11.1.1 FBfcc 


11-2 


SuperSPARC’s FPU executes floating-point operate instructions (FPops). It 
also participates with the integer unit (IU) in executing floating-point events 
(FPevs). An FPev is one of the following: 


Г] AnLDF, STF, LDDF or STDF instruction (load or store floating point regis- 
ters). 


Г] AnLDFSR or STFSR instruction (load or store the Floating Point Status 
Register). 


С} An ТОРО instruction (stores the Floating Point Deferred Trap Queue). 


О An SMUL, SMULcc, UMUL, UMULcc, SDIV, SDIVcc, UDIV or UDIVcc 
instruction (these integer multiply and divide instructions execute in the 
FPU). 


FPops include all instructions (such as FMULS) that calculate a result то a 
floating point (f) register from operands in f registers. FPops include FBfcc 
instructions, floating point move (FMOVS) instructions, compare (such as 
FCMPD) instructions, and convert (such as FSTOD) instructions. 


SuperSPARC completes all FPops in program order. Since only a single FPop 
can be issued to the FPU per instruction group, FPops arrive one at a time at 
the FPU. In the FPU they are entered in the Floating Point Deferred Trap 
Queue (FQ). Once the needed operands and resources are available, the 
FPop at the head of the FQ begins execution. 


The SuperSPARC FPU is interlocked for register dependencies. An FPop 
waits in FQ until its register operands are ready. Usually, a dependency is on 
а previous FPop that is computing a new value for an fregister. When the new 
value has been computed, it will be forwarded directly to the second FPop as 
it enters execution. Some references call this “chaining.” 


The following paragraphs detail important aspects of floating-point operation 
on the SuperSPARC processor (SSP) Many are clarifications or 
implementation choices from the SPARC Architecture. 


An FBfcc instruction can immediately follow an FCMP instruction on 
SuperSPARC. Some previous SPARC FPUS required at least one instruction 
between the compare and the branch, and the SPARC Architecture specifies 
that this instruction is required. Programs that do not need to be portable to 
other SPARC implementations may use the FCMP/FBfcc sequence without 
any instructions between them. 
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11.1.2 d Register Addresses 


FPops and FPevs that access double-precision floating-point registers (dreg- 
isters) ignore the least significant bit of the register index and access an 
aligned register pair as if the least significant bit were zero. 


The SPARC Architecture recommends an exception for non-aligned d and g 
(quad-precision) floating-point register accesses. SuperSPARC does not 
generate this exception. 


11.1.3 FSR Version and Implementation Fields 


FSR.VER and FSR.IMPL are always zero in the SSP's FPU. 


11.1.4 Unfinished FPop Exceptions 


SuperSPARC completes all FPops in hardware or generates the appropriate 
IEEE 754 exception. Therefore, unfinish FPop exceptions are never 
generated. Previous FPUs generated unfinished FPop exceptions when 
subnormai operands or results were encountered. SuperSPARC handles 
these cases in hardware. Extra cycles may be required when an operand or 
result is subnormal. (See Section 11.3). 


11.1.5 NaN to Integer Conversion 


11.1.6 NaN Results 


When the operand to a convert-to-integer instruction (FSTO! or FDTOI) is an 
NaN (Not A Number), SuperSPARC always produces 0 as the result. Some 
previous implementations of the FPU produce either 0x80000000 for -NaN or 
Ox7FFFFFFF for --NaN. 


When SuperSPARC's FPU produces a result that is an NaN, it is in one of the 
formats shown in Table 11-1. This is different from earlier SPARC FPU imple- 
mentations and from the IEEE 754 standard. 


When SuperSPARC produces an NaN result, it is stored in one of the fixed for- 
mats shown in Table 11-1. 


Table 11—1. NaN Output Representation Values 










An NaN can only be generated when NaN reporting is masked by 
FSR.TEM.NVM = 1. 
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11.1.7 Rounding Operations and Underflow Detection 


SuperSPAROC's ЕРО detects underflow after the rounding operation. The IEEE 
754 specification allows detection either before or after rounding. The SPARC 
Architecture also allows for either method. 


11.1.8 Quad Precision and Extended Precision 


SuperSPARC does not support quad-precision (128-bit) floating-point 
operations. Any attempt to execute a quad precision FPop will trap with an 
floating point. exception with FSR.ftt = 3 (unimplemented FPop). There are 
no quad-precision FPevs in The SPARC Architecture Manual. 


The SPARC Architecture, version 7, did not have quad-precision operations 
but had extended-precision (80-bit) operations instead. The extended-preci- 
sion operations of version 7 have been replaced by the quad-precision 
operations in SPARC, version 8. 


11.1.9 Floating-Point Exceptions 
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SuperSPARC implements deferred floating-point exceptions using the FQ. A 
floating-point exception will remain pending until another floating-point 
operation is requested, at which point the pending exception will be reported. 
Recognition of exceptions are governed by the state machine shown in 
Figure 11-1. 
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Figure 11—1. State Machine of FPU States 


fp. exception 


FSR.qne = 0 


FPop 
or 
FPev 
except 
STDFQ 
or 
STFSR 


Once an exception is taken, the fp_exception trap handler must empty the FQ 
before attempting to execute any FPops or FPevs, except for STDFQ and 
STFSR instructions. An attempt to execute a floating-point instruction after a 
trap and before the FQ has been emptied will generate an fp_sequence_error 
exception. 


Ҥ an exception first occurs in the same cycle in which a new floating-point 
instruction is being sent to the FPU (main pipeline's EO stage), the exception 
does not cause а trap on the new floating-point instruction. The exception 
remains pending until the next floating-point operation is sent to the FPU. If the 
new floating-point instructionis an FPev (load or store) that is either dependent 
on an FPop in the queue or modifies an operand to an FPop in the queue, the 
main pipeline will be held until the queued FPop completes, and any exception 
it raises will be reported immediately to the waiting FPev. 


The presence of processor pipeline hold conditions (particularly from data 
cache misses) can delay the acceptance of a floating-point exception. The 


exception will not be taken until a floating-point instruction is in EO and the 
pipeline is not being held. 
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SuperSPARC reports exceptions immediately to instructions that are 
dependent on the result of an instruction that generates an exception. 
Otherwise, the exception will be reported to a later floating-point operation. 


When an fp exception is signalled, FSR.ftt contains the trap type for this float- 
ing-point trap. FSR.ftt is encoded according to Table 11-2. 


Table 11—2. Floating-Point Trap Type (ftt) Field of FSR 


None 
IEEE 754 exception 
unfinished FPop 
unimplemented FPop 


sequence, error 
hardware error 


invalid fp register 


reserved 





t - SuperSPARC does not signal these trap types. 
SuperSPARC generates only three of the fp exception types: 


IEEE 754 exception 


unimplemented FPop 


sequence eror 


11.1.10 Non-Standard Mode 


The floating-point exception is an invalid exception, 
overflow exception, underflow exception, division by 
zero exception, or inexact exception as encoded in 
FSR.cexc. FSR.aexc and FSR.fcc are not affected 
by the instruction that caused this exception. 


An attempt was made to execute an unimplemented 
(or undefined) FPop. 


After an exception was signalled, a floating-point 
instruction other than STDFQ or STFSR was 
attempted before the FQ was emptied. 


A sequence error will be reported and recorded in 
the FSR when a new floating-point request is issued 
while the FPU is in exception mode. The exception 
will be reported to the instruction causing the 
Sequence error. 


SuperSPARC's FPU does not have a non-standard mode (FSR.NS). The 
FSR.NR bit can be read or written but has no effect on floating-point execution. 


11-6 


Floating-Point Unit Operation 


Subject to Change Without Notice 





Floating-Point Instructions 


11.1.11 Integer Multiply 


Since integer multiply instructions (i.e., SMUL, SMULcc, UMUL, UMULcc) use 
the multiplier of the FPU, the timing of integer multiply operations and floating- 
point operations are interrelated. Integer multiply operations only start if the 
floating- point queue is empty or if the FPU is in exception mode. Normal float- 
ing-point operations will not resume until the integer multiply has completed. 
Integer multiply operations do not affect the floating-point registers or floating- 
point condition codes (FSR.fcc). 


Integer multiply operations cannot cause any exceptions. 


11.1.12 Integer Divide 


Since integer divide instructions (SDIV, SDIVcc, UDIV, UDIVcc) use logic in 
the FPU, the timing of integer divide operations and floating-point operations 
are interrelated. Integer divide operations start only if the floating-point queue 
is empty or if the FPU is in exception mode. Floating-point operations will not 
resume until the integer divide has completed. Integer divide operations do not 
affect the floating-point registers or floating-point condition codes (FSR.fcc). 


Integer divide operations can cause divide by zero and illegal, instruction ex- 
ceptions but do not cause any floating-point exceptions. 
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11.2 Floating-Point Deferred Trap Queue (FQ) 


SuperSPARC’s FPU contains a queue that holds up to four entries corre- 
sponding to FPops that have been issued to the FPU but have not completed. 
Each entry has the instruction and the virtual address from which it was 
fetched. FPevs are not entered into the FQ. 


The contents of the FQ can be stored to memory with STDFQ instructions. 
STDFQ should be issued only when the FPU is in exception mode and 
FSR.qne = 1. If an STDFQ instruction is executed on a SuperSPARC FPU 
when not in exception mode, the processor will hold the main (integer) pipeline 
until all floating-point operations have completed and then store the 
information from the last completed FPop. 


The FQ is not initialized at reset. Therefore, executing STDFQ instructions 
before any FPops have executed will store undefined values. 


A STDFQ will store a doubleword to memory in the format shown in 
Figure 11-2. Each doubleword is a single queue entry containing an FPop 
instruction and the address from which it was fetched. 


Figure 11-2. Floating-Point Queue Format 
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Address + 0х0 
Address + 0x4 


32-bit virtual address program counter 


31 0 


Each time an STDFQ instruction is executed while in exception mode, the next 
entry in the queue will be stored. The FSR.qne bit should be checked before 
each store to identifiy the last valid queue entry. 
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11.3 Timing of FPops 


The SSP can start a new FPop every cycle, but the different floating-point op- 
erations require different numbers of cycles to complete. Table 11-3, 
Table 11-4, and Table 11-5 summarize the latency values of the Super- 
SPARC FPU. 


The execution latency of each of these FPops depends on the source(s) and 
destination data formats. The notation used in these tables designates: 


n floating-point normal number. 

S floating-point subnormal number. 

по=п  ldentifies source 1, source 2, and destination data formats, respec- 
tively. 


Table 11-3. Floating-Point Operation Latency—Variable-Latency FPops 
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Table 11-4. Floating-Point Operation Latency—Three-Cycle FPops 





tOnly result is condition code (fcc). 


Table 11—5. Floating-Point Operation Latency—Unary Latency FPops 
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Traps 


This chapter describes SPARC traps and the SuperSPARC processor's 


(SSP's) implementation of them. The traps are implemented to the definition 
in The SPARC Architecture Manual. 


Topic 


Page 
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12.1 Introduction 
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Атар is a vectored transfer of control to supervisor software through the trap 
table. Traps are caused by one or more of the following: 


С Enabled exceptions. 


Exceptions are generated by the execution of a particular instruction. 
Execution of the instruction generating the exception may be simulated by 
supervisor software or retried after intervention by supervisor software, or 
supervisor software may abort the program which generated the excep- 
tion. Mosttypes of exceptions cannot be disabled; however, some types of 
fp exceptions may be masked (see Section 4.9), and memory and 
Memory Management Unit (MMU) exceptions can be masked with 
MCNTL.NF (see Chapter 9). 


С] Errors. 


Errors result from serious hardware problems or supervisor software mal- 
functions that generally cannot be attributed to a single instruction and 
cannot be recovered. Errors are not maskable. 


(ад Resets. 
Resets are described in Chapter 13. 


O Interrupts. 


Interrupts are externally or internally generated asynchronous events. Ex- 
ternally generated interrupts are usually caused by I/O events. Super- 
SPARC's only intemally generated interrupts are from the breakpoint facil- 
ity, as described in Chapter 15. 


Atrap is always reported to a particular instruction that will be identified as the 
trapping instruction. Exceptions are always reported to the instruction that 
caused them, except for floating-point IEEE 754 exceptions, which are def- 
erred and reported to a later floating-point instruction. An error generally can- 
not be reported to the instruction that caused it, and it is reported to a later 
instruction. Interrupts are reported to an instruction in the EO pipeline stage at 
the time the interrupt is recognized. If possible, error mode reset is reported 
to the instruction that caused it. Hardware reset acts immediately. 


A trap is processed automatically and performs a series of steps. The trap: 


1) Stops executing the trapping instruction and any instructions after it that 
may have started execution. 


2) Completes any pending stores from the store buffer. 
3) Saves PSR.S in PSR.PS and sets PSR.S. 
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Clears PSR.ET. 


Decrements CWP to a new window (such as SAVE), but without checking 
for window overflow. 


Saves the PC in the new window's local 1 (%1 or %r17) and nPC in the 
new windows’ local 2 (912 or %г18). 


Starts executing from the trap vector location for this trap type. 
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12.2 Trap Types 
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Each trap has a trap type that indicates the kind of exception, error reset, or 
interrupt that generated it. For some trap types, additional registers must be 
examined to determine the precise cause of the trap. 


The trap types that occur in the SSP are shown in Table 12-1. The table lists 
the following for each type of trap: 


C) Trap priority rank. 


Describes the exception priority rank. In the event that more than one ex- 
ception is detected at a given instruction, the trap with the lowest priority 
rank will be taken. 


С) Type number. 
The value to which TBR.tt will be set. 


С Trap table offset. 
The offset to the first instruction of the trap handler. 


The trap table has room for four instructions of the trap handler. Longer trap 
handlers can branchto the remainder of the handler outside the trap table. See 
Section 4.4. i 
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Table 12-1. Traps Supported by SuperSPARC 


TRAP NAME 







PRIORITY | ТВАР ТУРЕ 


TRAP TABLE 
OFFSET 






ois 
BC 
NL AM 
NL — 
ово — 
ads 
LI 
division by zero | бф | 
IX 


interrupt level 15 
interrupt level 14 
interrupt level 13 
interrupt level 12 
interrupt level 11 
interrupt level 10 
interrupt level 9 
interrupt level 8 
interrupt level 7 
interrupt level 6 
interrupt level 5 
interrupt level 4 
interrupt level 3 
interrupt level 2 
interrupt level 1 


0x17 


ШИШ ШШШ 
% 


% 


ШИШИШИ 


m 
~ 
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Trap Types Not Implemented 


SuperSPARC does not implement certain optional trap types defined by the 


SPARC Architecture. The trap types not implemented by SuperSPARC are 
shown in Table 12-2. 


Table 12-2. Trap Types Not Implemented by SuperSPARC 













Trap Type Explanation 
Instruction, access, error All instruction access faults are signalled as 
instruction access exceptions. 
R register. access, error SuperSPARC does not detect any condition that 
could lead to this error. 


SuperS PARG does по suppor а coprocessor 


Data access error All data access faults are signalled as data ac- 
cess exceptions. 
Unimplemented FLUSH FLUSH is implemented. 


Unimplemented MUL The various forms of integer multiply are imple- 
mented. 


The various forms of integer divide are implement- 
ed. 








Unimplemented ОМ 






Implementation-Dependent-Exception | SuperSPARC has no additional trap types beyond 


those in The SPARC Architecture Manual. 
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12.3 Exceptions and Program Counters 


Three program counters are involved when an exception occurs: 
О XPC, the exception program counter. 

О XNPC, the exception next program counter. 

С) XHPC, the beginning of the exception handler code. 


SuperSPARC recovers the XPC and XNPC values from one of the many pro- 
gram counters maintained in the pipeline, depending on the pipeline stage 
where the exception occurs. The values are then stored in registers 11 and I2 
inthenew register window before entry into the trap handler. XHPC is the TBR, 
which is computed from the TBA register and the trap type (tt), as shown in 
Figure 12-1. 


Figure 12-1. Exception Handler Program Counter (XHPC) 


~ | ll 
21 


31 11 3 0 
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12.4 Trap Priorities 
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When one or more exceptions, errors, interrupts, or resets occur on a single 
instruction, only the condition with the lowest priority rank will be recognized 
and trapped. The priority rank of the trap types in SuperSPARC is shown in 
Table 12-1. 


SuperSPARC can execute more than one instruction in a given cycle, and 
each of those instructions can have one or more exceptions. All exception 
requests propagate through the instruction pipeline and are resolved only in 
the last pipe stage (WB). Only the exception from the earliest (in program or- 
der) excepting instruction will be recognized and trapped. Priority ranks are 
never compared between two instructions. A low-priority exception at an earli- 
er instruction in a given instruction group has precedence over a high-priority 
exception at a later instruction in the same group. 
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12.5 Error Mode 


Error mode can be entered when any exception or error occurs while the ET 
bitis cleared. In general, these are considered fatal errors, though system soft- 
ware may be able to recover in certain cases. 


Supervisor should be careful not to generate exceptions when PSR.ET is 
Clear. Atraphandler can зе MCNTL.NF immediately on entry. The MCNTL.NF 
(No-Fault) bit disables reporting of data access exceptions and other 
memory errors (such as bus errors) to the CPU. 


Error mode generates a watchdog reset trap, which acts like any other trap, 
with a few differences. For most error-mode traps, the TBR.tt (trap type) field 
is not set. This is to retain the prior value of TBR.tt to help recovery. There is 
one case where this rule is not applied: If error mode is entered during the ex- 
ecution of an RETT instruction, the TBR.tt field will be set based on the exact 
cause of certain exceptions. See Section B.26, Return from Trap Instruction, 
in The SPARC Architecture Manual. 


In all cases, the PSR.PS (previous supervisor) bit will not be affected by a 
watchdog reset. 


In all cases, the MCNTL.BT (Boot Mode) bit will be set by a watchdog reset. 
For more information on error mode, see Chapter 13. 
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12.6 Traps and the Store Buffer 


12-10 


When the SuperSPARC CPU takes a trap, it disables further traps by clearing 
PSR.ET (Enable Traps). This makes the CPU vuinerable; if a synchronous 
(i.e., non-interrupt) exception occurs while PSR.ET is clear, the CPU enters 
error mode. Error mode will initiate a watchdog reset. 


Several steps are taken to help avoid this second exception and error mode. 


(3 Before traps are disabled (PSR.ET is still 1) and before the exception han- 
dier is fetched, the store buffer completes all pending writes to memory. 
This prevents any user-level faults caused by these writes from occurring 
at unexpected or unsafe points within the trap handler. 


(а Кал exception does occur on this store buffer flush, a data store excep- 
tion is taken instead of the pending trap. 


(а If the buffered stores complete successfully, the normal trap-handling se- 
quence continues, and PSR.ET is cleared. 


See Subsection 10.6.5. 
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12.7 Interrupts 


Interrupt level ntraps are triggered by hardware-interrupt requests. The nor- 
mal source of these requests is from external pin assertions on the IRL[3-0] 
pins. In addition to these external requests, SuperSPARC can generate inter- 
nal interrupt requests from several sources. 


External interrupts are generated by the system in а system-dependent man- 
ner. Their definitions are also system-dependent. SuperSPARC's IRL[3:0] 
pins are level-sensitive. System hardware can control these pins in an imple- 
mentation-dependent manner to implement different system-interrupt archi- 
tectures. When used with MXCC, the IRL pins are driven directly from MXCC. 
For additional information on interrupts when using MXCC, see Chapter 16. 


The IRL[3:0] pins are sampled in successive cycles to debounce transients. 
The IRL level must remain stable for three cycles before it is recognized as an 
external interrupt request. These are combined with internally generated re- 
quests to form the interrupt request. The IRL[3:0] pins are encoded with a 
single interrupt level. intemal requests are also presented at particular levels. 
Only the highest priority interrupt is presented to the pipeline. 


The PSR.ET (enable traps) bit must be set for any interrupt to be accepted (in- 
duding non-maskable interrupts). Requests are ignored as long as PSR.ET 
is clear. 


When PSR.ETis set andthereis an interrupt request, the request is compared 
to the current processor interrupt level (PSR.PIL). If the request is at a higher 
level than the current PIL, or equal to 15 (non-maskable), it is prioritized with 
other exceptions. The lowest ranking exception is taken. 


A valid instruction must exist at the WB stage of the pipeline before the inter- 
rupt will take a trap. Interrupt traps are processed like other traps. 


12.7.1 Interrupt Latency 


Maximum interrupt latency is determined by intemal and external factors, such 
as the maximum length of certain floating-point operations, system memory 
latency times, and the maximum store buffer depth. In addition, interrupt laten- 
cy can be increased by the time spent executing with PSR.ET clear. PSR.ET 
is clear on entry into trap handlers and may be cleared at other times by super- 
visor software. 


Note that interrupts are effectively positioned at the last valid instruction in a 
particular instruction group. In addition, a valid instruction must be present in 
order for the interrupt to be registered. If the execution pipeline is not progress- 
ing (waiting on cache misses, for example), an interrupt will not be taken. Due 
to this, maximum interrupt latency for SuperSPARC is highly system-depen- 
dent. 
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interrupts 


An undesirable side effect of flushing the store buffer on trap is a higher maxi- 
mum interrupt latency. Every interrupt requires a copy-out, which in the worst 
caseinvolves eightunbuffered writes to memory. This delay may be unaccept- 
able for real-time applications. The copy-out requirement can be avoided, 
however, by running with the store buffer disabled; in this case, the store buffer 
is always empty. Fast interrupt response is guaranteed, although overall per- 
formance is reduced by the resulting synchronous external stores. 


12.7.2 Internal Interrupt Sources 
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The only internal source for interrupt requests is breakpoints. The breakpoint 
logic can be programmed to cause interrupts at a software-controlled level on 
certain events. These may be code-, data-, or counter-generated breakpoints. 
Several registers, including MDIAG.BKC (breakpoint control), ACTION (ac- 
tion on breakpoint event), and the counter registers determine when these in- 
terrupts are generated. The level generated for these interrupts is defined in 
the ACTION.BCIPL (breakpoint and counter-interrupt level). 


See Section 15.2. 
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Trap Descriptions 


12.8 Trap Descriptions 


Table 12-1 shows the trap types that can occur in SuperSPARC. Each type 
of trap is described below. 


12.8.1 Reset Trap 


The reset trap does not set the trap type, TBR.tt, except for watchdog reset 
induced by error mode entered from an RETT instruction (see Section 12.5). 
It is the highest-priority trap. 

SuperSPARC recognizes three types of reset: 

[1 Hardware Reset. 

С Built-in Self-Test (BIST) Reset. 

о Watchdog Reset. 


For a detailed description of reset operation, see chapter 13. 


12.8.2 Data Store Error Trap 


The data store error trap is trap type Ox2b, priority 2. 
Data, store. error is caused when a store buffer copy-out encounters an exter- 


nal bus error (e.g., parity or ECC problem). For a detailed description, see Sub- 
section 10.6.5. 


12.8.3 Instruction Access Exception Trap 
The instruction access exception trap is trap type 0x01, priority 5. 
Instruction access exception may be caused by many events. The normal 
source of this error is MMU protection errors for pages that have been 
swapped out of virtual memory space. Bus errors and code breakpoints can 


also cause this exception. See Subsection 9.12.3 for a detailed description of 


error reporting for these traps. Note that the fault address register (FAR) is nev- 
er updated for these errors. 


12.8.4 Privileged Instruction Trap 
The privileged instruction trap is trap type 0x03, priority 6. 
Privileged instruction trap is caused when PSR.S is cleared and a privileged 


instruction is issued. There are many classes of privileged instructions. See 
The SPARC Architecture Manual for a complete list. 
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12.8.5 Illegal Instruction Trap 


The illegal instruction trap is trap type 0x02, priority 7. 


Illegal instruction (sometimes also called unimplemented instruction) trap is 
caused when an instruction with an unassigned opcode or illegal opcode is ex- 
ecuted. This trap may also signal instructions that are used in inappropriate 
ways or with invalid fields or data. For example, SuperSPARC issues an illegal 
instruction trap when an RETT instruction is executed with the PSR.ET bit set 
and in several data-dependent cases of the WRPSR instruction. An Integer 
divide instruction can also generate a data-dependent illegal instruction trap 
if the dividend has significant bits beyond bit 51. 


12.8.6 Floating-Point-Disabled Trap 


The floating-point-disabled trap is trap type 0x04, priority 8. 


FP. disabled trap is caused when PSR.EF is cleared and any floating-point in- 
struction is issued. This trap is never issued for integer multiply or integer di- 
vide. 


12.8.7 Coprocessor-Disabled Trap 


The coprocessor-disabled trap is trap type 0x24, priority 8. 


SuperSPARC does not provide a coprocessor interface. PSR.EC (enable co- 
processor) bit is zero and non-writable. Any attempt to set the PSR.EC bit will 
generate an illegal instruction trap. Any attempt to execute a coprocessor in- 
struction will generate a cp disabled trap. SuperSPARC never generates a 
cp. exception trap. 


12.8.8 Window Overflow Trap 


The window overflow trap is trap type 0x05, priority 9. 


Window. overflow trap is caused by a SAVE instruction being executed when 
the next available register window (CWP-1) is marked invalid in the WIM regis- 
ter. Note that this trap is generated only in response to a SAVE instruction; trap- 
ping cannot cause this exception. 


12.8.9 Window Underflow Trap 
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The window underflow trap is trap type 0x06, priority 9. 


Window. underflow trap is caused by either a RESTORE or RETT instruction, 
which would cause the incremented value of CWP to point to an invalid register 
window. In the case of RETT, this trap will immediately lead to error mode and 
watchdog reset. 
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12.8.10 Memory Address Not Aligned Trap 
The memory address not aligned trap is trap type 0x07, priority 10. 


Mem address, not aligned trap is caused by a load, store, or control transfer 
operation that violates the correct SPARC memory alignment. The correct 
alignment of an instruction is a word. Half-word references require the low-or- 
der address bit to be zero. Word references require the two address bits to be 
zero. Double-word references require the three low-order address bits to be 
zero. 


12.8.11 Floating-Point Exception Trap 


The floating-point exception trap is trap type 0x08, priority 11. 


Fp. exception traps are caused by certain floating-point arithmetic conditions 
being detected. These exceptions are deferred traps and will only be reported 
on execution of another floating-point operation (or floating-point event) at 
some later time. The time taken to cause the exception is variable, depending 
on the pipeline state, numeric conditions, and the exact sequence of instruc- 
tions. Improper floating-point error recovery can also cause these exceptions, 
particularly sequence errors. 


The cause of the fp. exception can be determined from the FSR ftt field. 
FSR.ftt can be decoded according to Table 4-3. The instruction that caused 
the exception can be determined by reading the state of the floating-point 
queue, described in Section 11.2. (See also Section 4.9.) 

12.8.12 Data Access Exception Trap 
The data access exception trap is trap type 0x09, priority 13. 
Data access exceptiontraps are generally caused by MMU protection errors 
for load and store operations. They may also be caused by bus errors and data 
breakpoints. See Section 9.12.3 for a detailed description of error reporting for 
these exceptions. 

12.8.13 Tagged Operation Overflow Trap 
The tagged operation overflow trap is trap type OxOa, priority 14. 
Tag. overflow traps are caused by execution of tagged add or subtract and trap 
on overflow (TADDccTV and TSUBccTV) instructions. The trap will be sig- 


nalled when the least significant two bits of either source operand is non-zero, 
or when the operation produces a result that causes the overflow flag to be set. 
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12.8.14 Integer Divide by Zero Trap 
The integer divide by zero trap is trap type 0x2a, priority 15. 


Division, by. zero trap is caused when an integer divide instruction is issued 
with the value of the second operand (denominator) being zero. 


If both divide by zero and the illegal instruction conditions exist, the illegal in- 
struction trap will be signalled. 

12.8.15 Trap Instructions (Ticc) 
Trap instructions are trap type 0х80-0х# (instruction dependent), priority 16. 


The trap type is computed by taking the low-order 7 bits of the sum of the two 
operands to the Ticc instruction and adding it to 0x80. 


Trap instruction traps are software-initiated traps caused by the execution of 
Ticc instructions. 


12.8.16 Interrupt Levels 


Interrupt levels are trap type 0x11-0x1f (request level dependent), priorities 
17-31. 


Interrupt level n traps are generated by hardware-interrupt requests. The 
normali source of these requests is from external pin assertions on the IRL[3-0] 
pins. In addition to these extemal requests, SuperSPARC can generate inter- 
nal interrupt requests. See Section 12.7. 


Generation and definition of interrupt requests is system-dependent. 
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The SuperSPARC processor (SSP) can be reset in three ways: 
(а Via its RESET pin, 
(С Automatically at the completion of Built-In Self Test, or 


(С Automatically after entering error mode. 
SuperSPARC must be reset when power is first applied. Its internal state is 


greatly altered by a reset. 
The МинСасће Controller (MXCC) may also be reset in several ways, all of 
which greatly alter its internal state. 
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13.1 Reset Types 
The SSP can be reset in three ways: 


[] Hardware reset is initiated from outside the processor by asserting the 
RESET signal. SuperSPARC must be reset when power is first applied. 
It is also possible to cause a hardware reset in emulation mode or by 
boundary scan. 


СО Built-In Setf-Test (BIST) generates a second type of reset when it com- 
pletes. BIST operations can be requested either by software (with a STA) 
or via the JTAG interface. When BIST is initiated through software using 
STA, an internal reset is automatically generated. When BIST is initiated 
by JTAG, however, reset is not automatically generated. This can be done 
by entering the TAP reset state by either assertion of TMS for five consecu- 
tive TCK cycles or asserting TRST. 


(а Watchdog resetis an internally generated reset caused by entry into error 
mode. 


The state changes in the SSP due to each type of resetis shown in Table 13-1. 
Table 13—1. Register State After Reset 


| мола — aman т | нев — 
Uncanged 
Boot Моде (ONTLET У Во mods 


= 


No Fault (MCNTL.NF) 0 (Faults enabled) 0 (Faults enabled) Unchanged 


Data Cache Enable \ ; 

т” мел ысмы 
Instruction Cache En- 0 (Instruction Cache Dis- | 0 (Instruction Cache Dis- Unchanged 

able (MONTL.IE) abled) abled) 

MONTES) ne 0 (Store Buffer Disabled) | 0 (Store Butter Disabled) | Unchanged 















к! Mode, | 1/0 D on 1/0 ин ти 
(NORTE PE) 
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Table 13—1.Register State After Reset (Continued) 















STATE AFTER RESET 


| змея _ _____ 


on Instruction 0 (Single Instruction Ex- | 0 (Single Instruction Ex- | 0 (Single Instruction Ex- 
(ACTION.MIX) ecution) ecution) ecution) 
ома onc) | арене | negro 


0 (PC = 0x0, ПРС = 0x4) | 0 (РС = 0х0, NPC = 0х4) | 0 (PC = 0х0, nPC = 0х4) 


BIST Status 00 (No BIST since reset) Ea (BIST run since | Not Affected 


Processor Status Regis- | 5=1, ЕТ=0, EC=0, S21, ЕТа0, ЕС=0, 521, ET=0, EC=0, 


ter Мегг0), Impis4, Ver=0, Impl=4, Ver=0, impl=4, 
(PSR) PSR.CWP Uninitialized | PSR.CWP Uninitialized | PSR.CWP Unchanged 


Fautt Status Register Uninitialized (except for | Uninitialized (except for | Unchanged (except for 
(MFSR) MFSR.EM bit) MFSR.EM bit) MFSR.EM bit) 
Shadow Fault Status Wu cem 


Emulation Facilities Disabled Unchanged 















Values shown as "uninitialized" are not set to any guaranteed state after 
reset. These values should be initialized before use. 


Values shown as "unchanged" are not affected by the indicated type of reset. 
| пи | 
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Note: 


After any reset, SuperSPARC will execute in single-instruction execution 
mode. Multiple-instruction-per-cycle execution is enabled by setting the 
ACTION.MIX bit. 





13.1.1 Boot Mode 


Boot mode is a special mode ofthe Memory Management Unit (MMU) in which 
all instruction accesses (and alternate spaces references through А515 0x08 
and 0x09) pass their virtual address in physical address bits 27 through 0, and 
the upper eight physical address bits (35 through 28) are set to OxFF. 


Boot mode is entered after any reset (either hardware or watchdog). It may be 
disabled by clearing the MCNTL.BT bit explicitly. Note that boot mode over- 
rides the MMU enable for instruction accesses. Boot mode does not affect data 
references. 


During boot mode, the MCNTL.AC bitis ignored for instruction references and 
altemate space transactions through instruction space ASls (0x08,0x09), 
making these accesses to instruction space non-cacheable when BT=1. How- 
ever, the AC bit is still used for data references in this mode. 


13.1.2 Determining Reset Type 
Since all three types of reset take a reset trap with MCNTL BT set, all enter the 
reset handler at physical location OxFF0000000. The reset handler should ex- 


amine MFSR.EM and BIST.STATUS to determine the type of reset. See 
Table 13-2. 


Table 13-2. Decoding Reset Types 


STATE AFTER RESET 
NFER EW ror Wade) ЕЕ ү, 










13-4 Reset 


Subject to Change Without Notice 





Hardware Reset 


13.2 Hardware Reset 


Hardware reset is requested by asserting the RESET pin for a minimum of 
eight cycles or greater and then deasserting it. As soon as RESET is asserted, 
SuperSPARC deasserts or disables all outputs except for TDO and ESB. Ex- 
ternal logic should monitor RESET for the validity of contro! signals. 


13.2.1 Hardware Reset Requirements 


When SuperSPARC is first powered on, the phase locked loop may need a 
long period to stabilize. RESET must remain asserted during this time. 
Approximately 100 milliseconds will be required. Unpredictable operation will 
result if the RESET is released before the PLL has stabilized. 


At power-on, RESET must be held asserted for 16 cycles after the PLL has 
stabilized. At othertimes, when both power and the PLL remain stable, RESET 
can be asserted for as few as eight cycles. 





Note: 


In order for SuperSPARC to properly reset, care must be taken in the system 
implementation. In particular, JTAG operation may affect SuperSPARC's 
ability to reset (the JTAG TAP controller should be in the reset state when 
hardware reset is asserted). 





13.2.2 Hardware Reset Sequence 


SuperSPARC takes several actions in response to a hardware reset request. 


Once RESET has been deasserted, SuperSPARC spends several hundred 
cycles initializing internal logic. This initialization takes over 340 cycles and is 
internally timed and the same for all devices of the same stepping. During this 
time, the cache column redundancy repair circuits are configured, and all chip 
outputs are held inactive or under high impedance. 


Next, intemal registers are set, per table Table 13-1. Then SuperSPARC 
takes a reset trap (See Chapter 12). 


This forces execution to begin at virtual address 0x0. Since boot mode is set, 
physical address OxFF0000000 is used to fetch instructions from memory. 
System software may distinguish hardware reset from watch-dog reset by the 
MFSR.EM (error mode) bit being cleared. Since the ACTION.MIX (multiple in- 
struction execution) bit is cleared, SuperSPARC’s superscalar execution is 
disabled, and a maximum of one instruction may be executed in each cycle. 


None of the entries in the store buffer, data cache, instruction cache, or TLB 
(except lock bits) change. Software should initialize each resource before en- 
abling it. 
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13.3 Watchdog Reset 
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In addition to the hardware reset, there is an internally-generated reset re- 
ferred to as a watchdog reset. This reset is caused by entry into error mode. 
Error modeis described in Section 12.5 and in The SPARC Architecture Manu- 
al. 


To allow recovery from many error-mode conditions, very little state is affected 
by watchdog reset. The only MCNTL bit affected by a watchdog reset is the 
MCNTL.BT (boot mode) bit. The MFSR.EM (error mode) bit is set to indicate 
thatthis is a watchdog reset, as opposed to a hardware reset. Since the cache 
redundancy logic has already been programmed (during hardware reset), it is 
not done again. Breakpoints are cleared at watchdog reset. 


In VBus Configurations, SuperSPARC asserts ERROR for one cycle, allowing 
the MXCC (or external system logic) to record the occurrence of error mode. 
At the completion of this bus cycle, a watchdog reset is generated. 


When the SuperSPARC processor is used directly on the MBus without the 
МХСС, АЕН is asserted and а watchdog reset is generated. AERA remains 
asserted until MFSR.EM is cleared. 


Once the above actions have been completed, a reset trap will be taken and 
contro! will pass to physical address OxFF0000000 as with hardware reset. 
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13.4 Bullt-In Self-Test (BIST) 


SuperSPARC has BIST logic on-chip. BIST is a quick check for device integri- 
ty; it is not an exhaustive proof of device function. Many types of device faults 
will be detected by an incorrect signature value after a BIST. There are two 
types of BIST: short and long versions. The long BIST operation, though a 
more exhaustive check of the logic than the short BIST, is not supported. 


This sectiondescribes how BIST operates, how to initiate BIST, and howto use 
the results. Also included are wamings regarding BIST operation. 


13.4.1 Internal BIST Operation 


BIST uses internal logic scan paths to write in pseudo-random test pattems 
into the chip logic (internal states). One cycle of execution is then run to let the 
states assume their next states; then state of the logic is captured through the 
scan path into a signature analyzer. The signature analyzer creates a signa- 
ture value based on the results from the logic and stores this value in the 
BIST.SIGNATURE register. 


13.4.2 BIST Coverage 


13.4.3 Signature 


The BIST sequence checks all normally scannabie logic but does not check 
the internal memory arrays. The TLB, store buffer, prefetch buffers, cache ar- 
rays, and register files are not checked by BIST. 


The correct signature value is known but is device-stepping-dependent. Cor- 
rect signature values for the device will be published in the data sheet. 


Different signature values are generated for the long (unsupported) and short 
BIST operations. 


13.4.4 Initiating BIST 


To initiate BIST, a STA to ASI 0x39 is issued. A store to address 0 selects a 
short BIST. An unsupported long BIST is selected by writing to virtual address 
0x100. An А510х39 access to any address other than 0x100 or 0x0 will gener- 
ate a data access exception. 


Once requested, interna! logic controls the BIST operation. An external reset 
aborts the BIST operation. When the sequence completes, an internal reset 
is generated (see the hardware reset description above). 


BIST may aiso be initiated through the JTAG interface. For details on JTAG- 
initiated BIST, see Chapter 21. 
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13.4.5 BIST ASI Operation ASI-0x39 - BIST Diagnostic Interface 


There are two memory-mapped MMU-resident diagnostic registers used to 
support BIST (see Table 13-3 and Table 13-4). Any byte, halfword, and dou- 
bleword or swap access into any of these diagnostic registers is explicitly ille- 
gal and will generate a data access exception. LDA/STA single only is al- 
lowed. 


Table 13-3. BIST Diagnostic Registers Within ASI 0x39 


Don't Care (should be zero) | Start Short BIST 


0х00000100 | Store | Don't Саге (shouldbe zero) | Start Long BIST 







(unsupported) 
| 0x00000000 | Load | Signature[31:0] Read 31-bit signature 


| 0x00000100 | Load | Status[31:0](see value below) | Read 2-bit BIST Status 


The possible values of the status register are shown in Table 13-4. 
Table 13-4. BIST Status Register Values 











[was [шшш | 
0x00000000 | No BIST run 
0x00000001 | Short BIST complete 


0x00000002 | Long BIST complete 
(unsupported) 


BIST.STATUS is not affected by watchdog reset, butis cleared to zero by hard- 
ware reset. 
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13.4.6 Warnings Regarding BIST Operation 


After BIST finishes, SuperSPARC generates an internal reset. This internal re- 
set behaves similarly to the hardware reset. One of the consequences of this 
reset is that MONTL is initialized. In particular, the MONTL.PE bit is cleared, 
so that SuperSPARC does not check parity but generates inverted parity on 
the pins. This internally generated reset is not seen by a cache controller, such 
as the MXCC, so that if the parity were enabled in the cache controller before 
starting BIST, it would still be enabled in the cache controller after BIST. This 
results in parity mismatch between SuperSPARC and the cache controller. 
Software should checkthe MXCCCR.PE bit after BIST has completed and ad- 
just the SuperSPARC’s MONTL.PE bit before attempting any writes. To be 
safe, the following algorithm could be used: 


1) Before starting BIST, flush the store buffer (even though STA 0x39 also 
does it). 


2) Check the MXCC status register and wait until nothing is pending. 

3) Start BIST. 

4) After BIST finishes, read BIST status. 

5) Read BIST signature. 

6) Read MXCC control register. 

7) Set SuperSPARC's MCNTL.PE bit if CCCR.PE bit is set. 

B) Continue. 

Eventhough the MXCC was mentioned as an example, any other system com- 
ponent could have a similar problem since they do not see SuperSPARC's in- 


temally generated reset. Thus care has to be taken in recovering the system 
after BIST finishes. 
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13.5 Reset In MXCC 


After a system reset (ASTIN), the content of the external cache is not defined, 
and the extemal cache is disabled. The external cache is enabled by the start- 
up software by setting the cache enable (CE) bit in the cache controller config- 
uration register. Prior to enabling the cache, however, the software should ini- 
tialize the cache by clearing the Valid (V) and Pending (P) bits for each sub- 
block in the tag for each of the blocks in the cache. 

Reset can come from: 

С] the system RSTIN, 

О MXCC BIST, ог 

Г] the SSP. 

The processor can initiate two different resets: 

О watchdog (WD) reset, and 

Г] software intemal (51) reset. 


Remote processors on the system bus can initiate only software-intemal 
resets. The reset register is used to determine the type of reset. 


13.5.1 System Reset 
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On system reset, MXCC does the following: 

Г) Asynchronously disables all DATA/ADDR output drivers on the VBus and 
all bi-directional output drives on the MBus/XBus (all go to high imped- 
ance). 

Drives all control strobes on the VBus to high. 

Resets the SuperSPARC processor by asserting RESET. 

Disables the external cache. 

Resets all finite state machines. 


Resets all internal queues. 


п ш ш ш OO 


Resets the МХСС control register, status register, interrupt pending regis- 
ter, and reset register. 


О} Sets the interrupt mask register to 1's. 
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After system reset, MXCC does the following: 
[] Continues to reset the processor for eight cycles. 


о Configures external cache tag RAM column redundancy for 150 cycles. 
During this time, bi-directional control strobes are three-stated and uni-di- 
rectional output control strobes are deasserted. 


[] After configuring external cache tag RAM redundancy, asserts RGAT and 
WGRT. 


13.5.2 Software Internal Reset 
On software intemal reset, MXCC does the following: 
о Deasserts АСАТ and WGRT. 


(à Waits for pending operations to complete (but external cache updates will 
not be completed). 


(а Clears SXP in the status register and the WD bit in the reset register. 
С] Resets the processor for eight cycles. 
On a software intemal reset, the PE bit in MXCC and SuperSPARC may be 
different. The system software must ensure that both PE bits are identical be- 
fore issuing the first write after SI. 
13.5.3 Watchdog Reset 
On watchdog reset, MXCC does the following: 
О In MBus configuration, asserts AERAR. 
С Sets the WD bit in the reset register. 


13.5.4 BIST Reset 
AT the completion of a built-in self-test (BIST), the MXCC: 
1) Resets its intemal state, 
2) Clears WD and Sl in the reset register, 
3) Asserts both АСАТ and WGRT, ала 


4) Resumes normal operation. 


13.5.5 Reset Register 


MXCO's reset register is shown in Figure 16-13. Its contents can be used to 
distinguish between types of reset. 
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Table 13-5 shows the values in the WD and SI fields after Hardware, WD, 
BIST, and SI Resets. The value X in the table is interpreted as “don’t care". 


Table 13—5. The WD and SI Fields after Reset 





13.5.6 MultiCache Controller Reset Requirements 
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To ensure the proper operation of the MXCC, the following requirements must 
be met by the system for reset: 


С When MXCC is first powered on, the phase locked loops may need a long 
period to stabilize. ASTIN must remain asserted during this time. Approxi- 
mately 100 milliseconds wil! be required. Unpredictable operation will re- 
sult if the ASTIN is released before the PLLs have stabilized. 


Q At power-on, RSTIN must be held asserted for 16 cycles after the PLLs 
have stabilized. At other times, when both power and the PLLs remain: 
stable, RSTIN can be asserted for as few as eight cycles. 


С ASTIN can be asynchronous to either or both of BCLK and PCLK. 


О JTAGreset (TRST) must be asserted at power-on for a minimum of 50 ns. 
TRST can be asynchronous to any or all of BCLK, PCLK, and TCK. Two 
TCLKs elapse after THST is deasserted before TMS may be asserted. 


О After ASTIN is deasserted, there should be no requests from XBus or 
MBus for a minimum of 150 PCLK cydes in order to allow the external 
cache tag memory column redundancy programming to complete. There 
also should be no JTAG operations during this time. 


С All the three-state outputs on the MBus or the XBus (as selected by 
MBSEL) will be placed in their high-impedance state. It is the responsibility 
of the system logic to assure that these signals remain in their appropriate 
states with pull-ups as necessary. 


О) After a boundary/internal scan test, the TAST and RESET should бе as- 
serted in the same way as during power-on reset for the chip to enter nor- 
mal operation mode. 


О RSTIN should be held deasserted during intemal scan. 
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С] RESET is asserted to the processor asynchronously as soon as ASTIN 
is asserted. MXCC keeps asserting RESET for eight cycles after ASTIN 
is deasserted. SuperSPARC disables all bi-directional signals on VBus 
asynchronously when RESET is asserted. While ASTIN is asserted, 
MXCC drives the bi-directional VBus signals with weak drivers toward 
Voc. After RSTIN is deasserted, MXCC drives all the bidirectional control 
signals to logic high and then releases them before RESET is deasserted. 
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Chapter 14 


Startup Procedure 





This chapter contains guidelines for developing a reset handler for the Super- 
SPARC processor (SSP) and the MultiCache Controller (MXCC). An example 
reset handler is presented. 
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14.1 Reset Handling 


This is a guideline for writing a reset handler for the SSP. Many aspects of the 
handler are system-dependent; each system requires the careful develop- 
ment ofits own specific reset handler. Figure 14-1 is aflowchart of a resethan- 
dlerthat might serve for a variety of systems. The example resethandier devel- 
oped in this chapter follows Figure 14-1. 


Figure 14—1. Reset Handler Flowchart 






Start 
short BIST 
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In Figure 14-1, power-on reset automatically invokes a short built-in self-test 
(BIST). When BIST completes, it generates a BIST reset, which completes the 
initialization of the machine. 


See Chapter 13 and Chapter 23 for specific requirements for the timing of НЕ- 
SET and CLK. 
Type Determination 


The first step for a reset handler is to determine what kind of reset caused the 
reset trap. Resets that can cause reset traps are in one of the following three 
categories (see Section 13.1): 


С Error Mode (Watchdog) Reset, 
L] BIST Reset, or 
(Д Power-on (Hardware) Reset. 


Table 13-2 shows the differences between the three reset types that can be 
used to determine the type of reset. 


Error mode resets cause MFSR.EM to be set on entry to the reset handler (see 
Subsection 9.12.1). For other types of reset, this bit is 0. BIST resets set 
BIST.STATUS to non-zero. Hardware reset can be identified because neither 
MFSR.EMnor BIST.STATUS is non-zero. This reset handler routine preserves 
all of the registers except фу and EM_RST_SCRATCH so that the error mode 
reset handier can log them for debugging. EM, RST, SCRATCH is a register 
that can be used by the error mode reset code as scratch (usually an easily 
reconstructible global register is chosen for this purpose). 


On entry to the reset handler (see Table 12-1): 

(] PSR.S is set (supervisor mode). 

О PSR.ET is dear (traps disabled). 

С] The Memory Management Unit (MMU) is disabled. 

О МСМТІ ВТ (boot mode) is set. 

С) The caches are disabled. 

Example 14-1 shows the entry and reset type determination for the example 
reset handler. 


14-3 


Subject to Change Without Notice 


Reset Handling 





Example 14—1. Reset Handler Entry and Reset Type Determination 


reset handler: 


mov 5а1,%у ! save $gl in £y 
! Only $y and 
! EM RST SCRATCH are 
! corrupted on the way 
! to em reset. 
! check if error mode: 
set 0x300, $g1 ! MFSR RW VADDR 
! ! ld & clr mfsr 
lda [%g1] 0x04, EM RST SCRATCH ! MFSR RW ASI 
! restore mfsr 
sta EM RST SCRATCH, [391] 0x04 ! MFSR RW ASI 
set 0x00020000, $g1 ! MFSR ЕМ, %91 
and &gl,EM RST SCRATCH,EM RST SCRATCH 
mov 5рзг, 591 ! save $psr in %91 
стр EM RST 5СВАТСН,0 | MFSR.EM-1 7 
bne error mode reset ! Yes, error mode reset 
nop | don't restore psr 
l 


with unknown 
check_if_BIST: 


set 0x00000100, $g1 | BI ST STATUS VADDR 

lda [391] 0x39,%g2 ! BIST STATUS ASI 

стр 592,0 ! BIST ЅТАТО5=0 7 

be power on reset | Yes, power-on reset 
nop 


ba,a bist reset 


.- 


else BIST reset 


14.1.1 Error Mode (Watchdog) Reset 


Error mode reset has occurred if MFSR.EM was set on entry to the handler. 
On entry, the "borrowed" registers are restored—only the contents of %y and 
EM RST, SCRATCH have been destroyed. The actions of the error mode re- 
set handler are system-specific. This routine can be used to log the trap type 
and other registers for debugging. After logging, the routine may attempt to re- 
start the system by treating this as a power-on reset. 


Example 14-2 illustrates the routine for the error mode reset. 
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Reset Handling 


Example 14—2. Error Mode Незе! 


error mode reset: 


14.1.2 BIST Reset 


mov $gl,$psr 
mov $y, 591 


restore %рзг 

restore %91 

code goes here to log 
the error mode and 
Save registers for 
debugging. 

Continue with full 
power-on reset. 


ba,a power on reset 


BIST resetoccurs atthe completion of along or short BIST. In the scheme used 
in Example 14-3: 


1) Power-on reset initiates a short BIST. 
2) Whenthe BIST completes, it causes another reset trap. 


3) After this second type of reset is decoded as a BIST reset, control is 
passed to BIST. RESET, where the BIST signature is checked (see Sec- 
tion 13.4). 


4) The processor is then initialized as if for a power-on reset without BIST. 


5) Control passes out of the reset handler into the initial program, which 
might perform additional processor and system tests or load a program. 
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Example 14-3. BIST Reset 


bist_data: 
-word 
жога 

bist reset: 


be,a 


long bist done: 


Short bist done: 


а 


стр 


ђе 


пор 
bist bad sig: 


BIST SHORT SIG 
BIST LONG SIG 


[290] 0x39, %93 


bist даба, %91 
52,2 


long bist done 
581, 0x4, 531 


[591], $g4 
$93, %94 


internal post 


14.1.3 Power-On (Hardware) Reset 


== (= (m qe өз» өтә 


= om om oa өз ea 


*g2 contains the BIST 
status if 1 was short 
BIST, if 2 was long 
BIST, don't get here 
with %g2=0 

read the BIST 
signature 
BIST SIG VADDR 
BIST SIG ASI 


BIST STATUS-2 is long 
BIST 


bump pointer if long 
BIST 


load correct 
Bignature 

check actual vs 
correct 

if good, setup normal 
env 


If you get here the 
BIST signature 
doesn't match. 
Probably want to do 
some failure logging. 


At power-on, some rudimentary internal and external tests are recommended. 
One testis to run BIST. The routine in Example 14–4 either only starts a short 
BIST or branches to intemal power-on self-test. BIST never returns, but it gen- 


erates a reset trap at completion. 
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Example 14—4. Power-On Reset 


power on гезе: 
#ifndef SKIP BIST AT POWER ОМ 
here to start a short 


I 
! BIST 
! BIST START SHORT VADDR 
! 180 
sta 590, [5840] 0x39 ! BIST START SHORT ASI 
nop 
nop 
nop 


bist not started: 
| If you get here BIST 
! didn't start. Should 
! probably log the 
! failure. 

#else 

ba,a internal_post 
#endif 
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14.2 Power-On Self-Test (POST) 
A POST tests parts of the processor and system that BIST did not test. 


14.2.1 POSTS of Internal Storage Devices 


BIST does nottest the intemal memory arrays inthe SSP. The internal storage 
on which additional testing should be performed include: 


Windowed register file. 
Instruction cache data. 
Instruction cache tags. 
Data cache data. 

Data cache tags. 

Store Buffer. 

TLB. 


Duvet ео 


The following subsections contain examples of code segments for testing the 
internal storage of the processor. 


14.2.1.1 Windowed Register File 


Testing of the intemal memory arrays of the processor starts with a rudimenta- 
ry test of the windowed registers (Example 14-5). The routine tests the regis- 
ters in a window, then moves to the next window. This test sets and checks 
several registers simultaneously to avoid activating bypass paths. The test 
detects all stuck-bit faults, most bit-line shorts, and many address-decoding 
faults. The 96g registers are not explicitly tested. 


Example 14—5. Windowed Register File Test 


internal post: 

test machine 

Start by testing 
windowed regs 

run this with 
ACTION.MIX=0 or may 
test bypassing 


test reg windows: 


set 0х1087, %97 ! PSR with СИР=7 
set 0x1080, +92 ! loop limit 
14-8 Startup Procedure 
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set 
set 
set 
test window loop: 


bne 


Subject to Change Without Notice 


-1, *g6 
0х55555555, g5 
OxAAAAAAAA, £gá 


$g7,*psr 


$90, %міт 
$96, $00 
$96,501 
%96, 502 
$g6,*03 
%96, 504 
446, %05 
5346, +06 
596, 507 
596,%10 
596,%11 
536, 12 
596,313 
596, 514 
$96,515 
546, %16 
%96, %17 


500, +96 
tests bad 
535, 500 
501,596 
tests bad 
535, 501 
£02, %96 
tests bad 
595, 502 
503, %96 
tests bad 
$95, 503 
500,595 
tests bad 
594, 500 
501, %95 
tests bad 
5g4, 501 
502, 595 
tests bad 
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set the CWP 


good value for WIM is 


0, no traps 
set the WIM 


initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 
initialize to 


check £o0 


l= so test is 


5's to %00 
check 501 
‘= во test 
5’s to 501 
check $02 
l= во test 
5's to %02 
check $03 
t= во test 
5's to $03 
check 500 
|= so test 
A's to 500 
check $01 
|= so test 
A's to $01 
check $02 
l= so test 


is 


is 


is 


is 


is 


is 


F’s 
F's 
F’s 
F’s 
F’s 
F's 
F's 
F's 
F's 
F's 
F’s 
F's 
F's 
F's 
F's 
F's 


bad 


bad 


bad 


bad 


bad 


bad 


bad 
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bne 
mov 
test_04_05_06_07: 


594, $02 
$03,595 
tests bad 
*94, 503 
500, %94 
tests bad 
$g0, 500 
501, %94 
tests bad 
$g0, 501 
502, 534 
tests bad 
$g0, $02 
503, $94 
tests bad 
590, $03 


504, %96 
tests bad 
5345, 504 


A's to $02 
check 503 
!- so test 
A's to $03 
check £00 
'= во test 
zero 500 
сћеск 501 
!= so test 
zero $ol 
check $02 
|= so test 
zero 502 
check $03 
t= so test 
zero %03 


check £04 
|= so test 
5's to %o4 


is 


is 


is 


is 


is 


is 


bad 


bad 


bad 


bad 


bad 


bad 


Continue in the same way, testing 9604, 9605, 9606, and 9607, then 9610, %11, 
96l2, and %I3, followed by 964, 9615, 9616, and %17. 


test loop tail: 


or 
or 
or 
or 
or 
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517,*g4 
tests bad 
590,517 


$00, 501, 500 
500, 502, 500 
500, $03, $00 
500, $04, $00 
500, $05, %00 
500, 506, 500 
$00, 507, 200 
tests bad 


check $17 


l= во test is bad 


zero $17 


is any $0? non-zero 


not 0 so tests bad 
is any $1? non-zero 
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Power-On Self-Test (POS 


or %10,%11, $00 

or 500,%12, $00 

or $00,%13,%00 

or $00, 514, 500 

or 500, 415,500 

ок %00, %16, $00 

orcc 300, 517, 500 

bne tests bad ! not 0 so tests bad 
nop 

add 537, -1, 37 ! update СИР 

стр +97, *92 k ! test for window 0 

bge test window loop 
nop 


ba,a i cache test 


14.2.1.2 Instruction Cache Data 
Example 14-6 contains a routine for testing the instruction cache data array. 
This test uses two helper routines, described in Example 14-9 and 
Example 14-10. 


1) The first step is to write OxAA in every byte of the cache data storage and 
verify that it is still OxAA when read. 


2) Next, step 1 is repeated with 0x55 written into and read from each byte. 


3) The next step is to write a value based on its address in each cache dou- 
bleword and verify that it can be read back. This tests address decoding. 


4) Step 3 is performed with the address-dependent data inverted. 


5) Thefinalstepis to perform step 1 with 0x00 written into and read from each 
byte. This leaves the cache containing only zeros. 


14-11 


Subject to Change Without Notice 


Power-On Self-Test (POST) 





Example 14-6. Instruction Cache Data Test 


i_cache_test: 


set OxAAAAAAAA, %00 

call i_cache_fill test 
mov 500, 501 

set 0x55555555, $00 

call i cache fill test 
mov $00, %01 

call i cache unique test 
nop 

set 0x0, $00 

call i cache fill test 
mov 500, $01 


ba,a d cache test 


14.2.1.3 Instruction Cache Tags 
This is left as an exercise to the user. 
14.2.1.4 Data Cache Data 


The format for the data cache data test (see Example 14-7) is the same as for 
the instruction cache data. It also uses two helper routines, described in 
Example 14-11 and Example 14-12. The data cache is also left containing 
only zeros. 


Example 14—7. Data Cache Data Test 


d cache test: 


set OxAAAAAAAA, $00 

call d cache fill test 
mov 500,301 

set 0x55555555, 400 

call d cache fill test 
mov $00, %01 

call d_cache_unique test 
nop 

set 0x0, $00 

call d cache fill test 
mov 500, 201 
ba,a sb test 


14.2.1.5 Data Cache Tags 
This is left as an exercise to the user. 
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14.2.1.6 Store Buffer Test 
This is left as an exercise to the user. 


14.2.1.7 TLB Test 
This is left as an exercise to the user. 
14.2.1.8 Other POSTs 
This concludes the POSTs of the internal storage devices. 


Next the extemal storage devices (including extemal cache and main 
memory) could be tested. See Example 14-8. 


When done with POST, branch to reset states to set a standard operating 
state. 


Example 14—8. Other POSTs 


sb test: 
! none here 
tlb test: 
! none here 
internal post done: 
external post: 
! none here yet 
external post done: 
! then complete reset 
! for standard 


! operation 
ba,à reset states 


tests bad: 


If reach here, some 
register has tested 
bad. Need to try to 
do some sort of 
System logging. 


14.2.2 POST Support Routines 


The following routines are called from the POST codes in Subsection 14.2.1. 
They consolidate portions of the tests that are used more than once. 


These are leaf routines and therefore do not use SAVE or RESTORE instruc- 
tions. They accept input operands in the 960 registers and use only % and 960 
registers for temporary storage. 
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14.2.2.1 Instruction Cache Support Routines 


The routine in Example 14-9 is the instruction cache data fill test. Its argument 
is a doubleword in 9600 and 9601, which is a pattern. This pattern is stored into 
each doubleword of the instruction cache. After all doublewords are filled, 
each doubleword is read to check the integrity of the pattern. If any mismatch 


is found, the test branches to tests bad. 


Example 14-9. Instruction Cache Fill Test 


i cache fill test: 


set 
set 
set 
set 
set 
set 


i c line st loop: 
i c word st loop: 


set 


i c line cmp loop: 
i c word cmp loop: 
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0x0, $10 
0x04000000, $11 
0x14000000, 512 
0х0, $13 
0х8, $14 
0x00001000, $15 


$00, [5104513] Ox0d 
$13,%14,%13 
%13,%15 

i c word st loop 


510,%11,%10 
%10,%12 

i c line st loop 
550,213 


0х0, $10 


$00, 501 is fill 
pattern 

use only $1? and 502 
registers 

BLOCK BASE ADDR 
BLOCK STEP 

BLOCK LIMIT (5) 
DWORD BASE ADDR 
DWORD STEP 

DWORD LIMIT 


Store pattern in data 
dword 


move to next line 


reset word counter 
stores are done, now 
compare 

reset line counter 


load contents of 
dword into %16 %17 
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ldda [%10+%13] Оход,516 
стр 500,316 


bne tests bad 
nop 
emp %01,%17 
bne tests bad 
nop 
! update word counter 
add $13,%14,%13 
cmp $13,%15 
bl i c word cmp loop 
nop 
! update line counter 
add %10,%11,%10 
cmp $10,412 
bl i c line cmp loop 
mov %90, $13 ! reset word counter 
retl 
nop 


The routine in Example 14-10 is a testfor instruction data array cache address 
uniqueness. This routine has no arguments. It stores every doubleword of the 
instruction cache with a doubleword composed of the word's address in the 
low word; it stores the complement of its address in the upper doubleword. 
When every cache location has been written, each is read and checked for cor- 
ruption. Any mismatch causes a transfer to bad, test. The test runs twice; all 
data is complemented on the first iteration and uncomplemented on the se- 
cond. 


Example 14—10. Instruction Cache Address Uniqueness Test 


i cache unique test: 
use only $1? and 50? 


{ 

! registers 
set 0x0, %10 ! LINE_BASE_ADDR 
set 0x04000000, $11 ! LINE STEP 
set 0x14000000, $12 ! LINE LIMIT (5) 
set 0x0, *13 ! DWORD BASE ADDR 
set 0x8, $14 ! DWORD_STEP 
set 0x00001000, $15 ! DWORD LIMIT 
set ~1,%03 ! COMPLEMENT 


i_c_u_complement_loop: 
i c u line st loop: 


1415 
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i c u word st loop: 


add 


set 


310,313, %00 
550, 500, %01 
500, 503, 300 
501, %03, 501 


$00, [510+%13] Ox0d 
$13,%14,%13 
$13,515 

i c u word st loop 


$10,%11,%10 
$10,512 

i c u line st loop 
540, %13 


0х0, $10 


i c u line cmp loop: 
1 c u word стр loop: 


add 
orn 
xor 
xor 
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510,213, $00 
590, +00, 301 
500, $03, 300 
501,503, 1301 


[510-413] Оход,%16 
500,516 
tests bad 


501,517 
tests bad 


%13,%14,%13 
%13,%15 
i c о word стр loop 


%10,%11,%10 
510,412 
i c u line стр loop 


store pattern in data 
dword 


move to next line 


reset word counter 
stores are done, now 


compare 
reset line counter 


load contents of 
dword into £16 %17 


update word counter 


update line counter 
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mov 500,513 ! reset word counter 
cmp $03, $90 
bne i с п complement loop 

mov 590, 503 
retl 

nop 


14.2.2.2 Data Cache Support Routines 


The routine in Example 14-11 is the data cache data array fill test. Its argu- 
ment is a doubleword in 9600 and 9601 , which is a pattern. This pattern is stored 
in each doubleword of the data cache. After all doublewords are filled, each 
doubleword is read to check the integrity of the pattem. if any mismatch is 
found, the test branches to tests bad. 


Example 14—11. Data Cache Fill Test 


d cache fill test: 
$00, tol is fill 


| 
і pattern 
! use only £1? апа $0? 
! registers 
set 0x0,*10 ! LINE BASE ADDR 
set 0x04000000, 511 ! LINE STEP 
set 0x10000000, +12 ! LINE LIMIT (5) 
set 0x0, $13 ! DWORD BASE ADDR 
set 0х8, 514 ! DWORD STEP 
set 0x00001000, $15 ! DWORD LIMIT 


d c line st loop: 
d c word st loop: 
! store pattern in data 


! dword 
stda $00, [%10+%13] 0х0? 
add 513,%14,%13 
стр 513,515 
bi а c word st loop 
nop 
! move to next line 
add £10,%11,%10 
cmp $10,212 
bl d c line st loop 
mov 500,513 ! reset word counter 


! stores are done, now 
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! compare 
set 0x0, 210 ! reset line 
counterd c line cmp loop: 
d c word cmp loop: 
! load contents of 
! dword into $16 $17 
тада [51044213] Ox0f,%16 


стр %00,%16 

рпе tests bad 
nop 

стр 501,%17 

рпе tests bad 
nop 


! update word counter 
add %13,%14,%13 


cmp $13,%15 
bi d c word cmp loop 
nop 


! update line counter 
add 510,%11,%10 
cmp 510,%12 


bl d c line стр loop 

mov %90, %13 ! reset word counter 
retl 

nop 


The routine in Example 14-12 is a test for data cache data array address 
uniqueness. This routine has no arguments. It stores every doubleword of the 
data cache with a doubleword composed of the word's address in the low 
word; it stores the complement of its address in the upper doubleword. When 
every cache location has been written, each is read and checked for corrup- 
tion. Any mismatch causes a branch to tests bad. The test runs twice; all data 
is complemented on the first iteration and uncomplemented on the second. 


Example 14-12. Data Cache Address Uniqueness Test 


d cache unique test: 
use only $1? and %0? 


! 

! registers 

set Ox0,*10 ! LINE BASE ADDR 

set 0x04000000, $11 ! LINE STEP 

set 0x10000000, $12 | LINE LIMIT (5) 

set 0x0,%13 1 DWORD BASE ADDR 

set 0х8, %14 ! DWORD STEP 
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set 
set 





0x00001000, 515 


-1,%03 


d c u complement loop: 
d c u line st loop: 
d c u word st loop: 


add 
orn 
xor 
хог 


set 


510,%13, %00 
$90, %00, $01 
500, £03, $00 
$01, $03, $01 


500, [%10+%13] ОхбЕ 


$13,$14,*$13 
513,515 


а c u word st loop 


%10,%11,%10 
%10,%12 


d c u line st loop 


590,513 


0х0, 510 


d c u line cmp loop: 
d c u word cmp loop: 


add 
orn 
xor 
xor 
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%10,%13, %00 
5840, $00, %01 
%00, 503, 500 
501,503, %01 


[5104513] 0х0#,%16 


500,516 
tests bad 


501,517 


tests bad 


%13,%14,%13 
%13,%15 


d c u word стр loop 


Power-On Self-Test (POST) 





DWORD LIMIT 
COMPLEMENT 


Store pattern in data 
dword 


move to next line 


reset word counter 
Stores are done, now 


compare 
reset line counter 


load contents of 
dword into 516 4317 


update word counter 
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пор 
! update line counter 
ада 510,%11,%10 
стр %10,%12 
bl d c u line стр loop 
mov 590,513 ! reset word counter 


стр 503, $g0 


bne d c u complement loop 
mov $g0, $03 
retl 
nop 
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14.3 Reset States 


14.3.1 PSR 


14.3.2 Reset States: 


Reset States 


This routine resets the processor's intemal states for normal operation. 


First setthe PSR (see Section 4.2) to some standard state as defined for a par- 
ticular system. 


The following are guidelines for setting the PSR on startup: 


1) SetSto enablethe reset handlerto perform LDA and STA instructions. PS 
may be clear to facilitate transferring into user mode with a RETT instruc- 
tion at the end of the reset handler. 


2) ET shouldbe clear because the state isn't set up yet for interrupts or traps. 

3) Set CWP to a convenient value. 

4) ЕС is clear because SuperSPARC doesn't have a coprocessor. 

5) EFis set so that the Floating Point Unit (FPU) can be initialized. 

6) ICC is set to any convenient value. 

7) Setthe WIM and TBR registers to values that work in this system. 

8) Enable superscalar execution (Example 14-13), which allows the rest of 
the reset handler to run faster. 

Control Registers and Superscalar Execution 


Example 14-13 shows an example of the portion of a reset handler that sets 
the initial values in the important system control registers. in the SSP, this in- 
Cludes enabling the execution of more than one instruction per cycle (super- 
scalar execution). 


The registers that should be initialized are: 
С] Processor State Register (PSR). 

О Trap Base Register (TBR). 

О Window Invalid Mask (WIM). 


Superscalar execution is enabled by setting the multiple instruction execution 
(MIX) bit in the breakpoint action register (ACTION). 


Example 14-13. Reset States: Control Registers and Superscalar Execution 


reset Stat 


ев: 


#define INIT PSR 0x00001087 ! CWP=7, S-1, EF=1 
#define INIT WIN 0x00000001 
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init cregs: 


set INIT PSR, %gl 

mov 591, %psr ! Initialize PSR 

set INIT TBR, $gl 

mov #91, %tbr ! Initialize TBR 

set INIT WIM, $91 

mov £gl, міл ! Initialize WIM 
enable superscalar exec: 

set 0x0, 591 ! ECNTL VADDR 

set 0х01000, $92 ! ECNTL DATA 

sta 502, [591] Ox4C ! ЕСМТІ АЅІ 

ba,a Clear mmu and caches 


14.3.3 Initialization of MMU and Caches 
The fault status register (MFSR), translation lookaside buffer (TLB), and 
caches must be cleared before the MMU or caches can be enabled. This pro- 
cedure is as follows: 
1) Clear the MFSR by reading it. 
2) invalidate the TLB entries with a demap_all operation. 


3) Clear the instruction cache valid bits and lock bits with instruction cache 
flash clear. 


4) Clear the data cache valid and lock bits with data cache flash clear. 


5) Initialize the MMU context register and MMU context table pointer to val- 
ues suitable for use in the system. 


CTP. VALUE is the physical address of the context table in memory (see Sub- 
section 9.12.2). 


Example 14-14 shows an example of code to initialize the MMU and caches. 
Example 14-14. MMU and Cache Initialization 


clear mmu and caches: 
clear mmu mfsr reg: 


set 0x0300, £g1 ' MFSR VADDR 
! Clear the MFSR by 
! reading it 
1да [591] 0х04,%90 ! MFSR ASI 
demap all tlb entries: 
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set 0x0400, 2g1 ! MENTIRE FLUSH VADDR 
! Demap all entries 
sta $90, [591] 0x03 ! MENTIRE FLUSH ASI 


clear icache valid bits: 


set 0х0, £91 ! CENTIRE FLUSH VADDR 
{ Clear all valid-bits 
sta +90, [%91] 0x36 ! CENTIRE FLUSH ASI 


Clear icache lock bits: 


set 0x8000000, $g1 1 CENTIRE LOCK FLUSH VADDR 
! Clear all LOCK-bits 
sta §g0,[%g1] 0x36 ! CENTIRE LOCK FLUSH ASI 


clear dcache valid bits: 


set 0x0, $g1 ! DENTIRE FLUSH VADDR 
! Clear all valid-bits 
sta 540, [591] 0x37 ! DENTIRE FLUSH ASI 


Clear dcache lock bits: 


set 0x80000000, %51 ! DENTIRE LOCK FLUSH VADDR 


! Clear all LOCK-bits 


sta $90, [*g1] 0x37 ! DENTIRE LOCK FLUSH ASI 


init mmu context reg: 


set 0x0200, $g1 ! MCONTEXT VADDR 

set USER CONTEXT NUMBER, %g2 

sta %32, [591] 0x04 ! MCONTEXT ASI 
init mmu ctp reg: 

set 0x0100, %91 ! MCTP VADDR 

set CTP VALUE, $g2 

sta *g2,[*g1] 0x04 ! MCTP ASI 


ba, а init mcc control reg 


14.3.4 MultiCache Controller 


If the configuration contains a MultiCache Controller, it must be initialized. The 
code in Example 14-15 initializes the MXCC Control register and MXCC Inter- 


rupt Mask register. 


The value of the МХСС_ СМТ DATA WORD is system-dependent. Follow- 
ing are some suggested starting-point values for an MBus system. 


ORC = 0 
Q BWC = 0 
ам „= 0 
ПРЕ «= 1 
Q MC = 1 


count both read and write references 
no BW's connected” 

no write invalidate* 

prefetching enabled 

multiple commands enabled 
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Reset States 


С PE 
Q СЕ 
С cs 
HC 
* Ignored in MBus configuration. 


0 
0 


parity enabled 
E-cache enabled 

1 MB of cache” 

not haif-sized cache" 


Following are some suggested starting-point values for an XBus system: 


RC 


wl 
PF 
MC 
PE 
CE 
cs 
HC 


оао 


О 


BWC 


0 


- о 


— 


Example 14-15. МХСС initialization 
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$ifdef MXCC 


count both read and write references 

BW count set to the number of BWs connected 
write invalidate enabled. 

prefetching enabled 

multiple command execution enabled. 

parity enabled 

extemal cache enabled 

1 MB E-cache 

not half-sized E-cache 


#define MXCC CNTL DATA WORD MBUS 0x0000003c 


#define MXCC CNTL DATA WORD XBUS 0x0000017c ! 2 BW's 
#define MXCC CNTL DATA WORD МХСС CNTL DATA WORD MBUS 
init mcc control reg: 
MXCC CNTL DATA WORD, %с1 

0x01C00200, *92 ! MXCC CNTL VADDR 
%90, [%92} 0x02 ' MXCC CNTL ASI 


set 

set 

stda 
init тсс int mask: 


set 


0x01c00500, *92 


allow all MXCC XBus 
interrupts 

they won't be 
recognized until 
PSR.ET=1 

MXCC INT MASK ADDR 


a coo s~ soo с с 


Startup Procedure 


Subject to Change Without Notice 





#endif 


Reset States 


mov 0х0, %91 | zero in %90,%91 pair 
stda +90, [%92] 0x02 l MXCC CNTL ASI 


14.3.5 MMU Control Register 


Next the MMU control register (see Example 14-16) must be initialized. The 
value for MCNTL depends on the MMU function desired in the system (see 
Subsection 9.12.1). 


The following subsections contain some suggested starting-point values for 
the MMU contro! register. 


MCNTL.TC and MCNTL.AC 


MCNTL.BT 


Q TC = 1 MMU tablewalks cacheable in the external cache 
О АС = 0 alternate accesses not cacheable 


MONTL.TC and MCNTL.AC values depend on how software accesses the in- 
memory page tables and other data in alternate spaces. 


The table walk cacheable bit indicates extemal cacheability, even though table 
walk data is never cached internally. 


The AC bit should be set to the same value as the C bit field of the store buffer 
tag register (since this is the state of the C bit in the original transaction). 


OBT = 0 boot mode disabled 


Reset has initialized MCNTL.BT to zero, forcing boot mode translations in the 
MMU for instruction fetch (see Section 9.10). This point in the reset handler 
may be too early for some systems to disable boot mode. In many systems, 
the contents of the boot PROM may need to be copied somewhere else and 
some MMU mappings established before switching off boot mode. 


MCNTL.PE and MXCCCR.PE 


MCNTL.PSO 


(ДРЕ = 1 parity generation/checking enabled 
MONTL.PE should match MXCCCR.PE. 


О PSO = 0 TSO 


Whether PSO is recommended depends on whether the software can accom- 
modate it. See Chapter 8. 
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Reset States | 


MCNTL.SE, MCNTL.SB, MCNTL.IE, and MCNTL.DE 


О SE = 1 snooping enabled 
О 58 = 1 store buffer enabled 
Е] Е = 1 |-саспе enabled 
ДРЕ = 1 D-cache enabled 


It is generally recommended that snooping, the store buffer, and the caches 
be enabled. See Chapter 10. 


MCNTL.NF and MCNTL.EN 
(NF «= 0 no-fault mode disabled 
(ДЕМ = 1 MMU enabled 
p——————————————————— 
Note: 


IFLUSH Instruction 


In order for the MMU to be enabled, the page tables, context tables, context 
table pointer, and context register must be set up. See Section 9.2. 


After storing the MCNTL register with changes affecting instruction access, 
issue an IFLUSH instruction to synchronize instruction fetch with the change 
to MCNTL. А JMP instruction may be included between the STA to MCNTL 
and the IFLUSH to transfer to a new address synchronously with the change 
in MMU modes. 


Example 14-16. MMU Control Register 
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! NOTE BT=1 
#define MCNTL DATA WORD 0x00017f01 
init mmu mcntl reg: 
set 0х0, 531 ! MCNTL VADDR 
set MCNTL DATA WORD, $52 
! init MCNTL 
sta *$g2,[*g1] 0x04 1 MCNTL ASI 
iflush %90 | җе теат» IMPORTANT *** 


/* 


yi 


New MMU modes take effect with the next instruction 
after the IFLUSH. MCNTL.BT could be off here so the 
next instruction could be from a non-contiquous 
address. Use caution! 


Startup Procedure 
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14.3.6 Floating-Point Unit 


Next the floating-point unitis initialized. The new FSR value can be loaded only 


from memory. (See Example 14-17.) 


Example 14—17. Floating-Point Unit Initialization 


#define INIT FSR 0x0F800000 ! 


#define FSR SCRATCH DATA ADDR 0x000F0000 


init fpu fsr: 


set FSR SCRATCH DATA ADDR, $g2 

set INIT FSR, %gl 

st %91, [532] ! 
ld [592], &fsr ! 


14.3.7 End Reset 


ТЕМ=ОХТЕ 
address of data word 
can read and write 


store init data 
Initialize FSR 


Example 14-18 shows a JMP/RETT pair that transfers to ENTRY POINT with 


S set to PS and ET set. 


Example 14—18. End Reset 


end reset: 

1 
! 
! 
! 
! 
! 

set ENTRY POINT, 512 

jup 512 ! 

rett 512 + 4 
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Go do some work. 
Below example uses 
JMP/RETT 

to go to user mode 
program. 5 «- PS and 
ET <- 1 

ENTRY POINT points to 
program entry point 


! Start the user 


program 
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Chapter 15 





The SuperSPARC processor (SSP) implements software-debugging capabili- 
ties and external monitors. These enabie the user to examine the SSP at each 
cycle and aid in system debug. 


Topic Page 
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15.1 Software Debugging Facilities 


The SSP provides debugging capabilities to facilitate debugging software and 
hardware prototypes. Four types of breakpoint mechanisms are provided: 


О Code address, 

Q Data address, 

(3 Instruction count underflow, and 
Г] Cycle count underflow. 


These mechanisms can be selectively enabled to generate either data_ac- 
cess_exceptions or instruction_access_exceptions. They can be pro- 
grammed to generate a selectable interrupt. In addition, they can be set to acti- 
vate an external pin to easily trigger extemal analysis equipment. They are 
also a fundamental part of SuperSPARC’s scan-based debug features, de- 
scribed in Chapter 22. 


In addition to address breakpoint facilities, programmable timers are provided 
for debug, code profiling, and performance analysis. They can provide a high- 
precision, low-overhead timer for small-application benchmarking. Their ma- 
jor application is to assist development or diagnostic teams in early hardware 
and software debug. 


15.1.1 Address Breakpoints—Code (Instruction) or Data 


152 


A single code (instruction) or data breakpoint register is available. This break- 
point can match on either 32-bit virtual or 36-bit physical addresses for code 
or data. When a breakpoint is set on an instruction, the instruction will not be 
executed. This holds true for fault, interrupt, and scan-based debug. 


д 


Note: 


A maximum of one code space breakpoint or one data space breakpoint can 
be active at a given time. 


ДД— дц 


Each bit т the bitwise address comparison can be masked off separately to 
force equality on that bitwise comparison. The address equality bitmasks can 
be used to find references within а particular segment, page, cache line, or 
word independent of the size of the access. Data space breakpoints can be 
qualified with access type (reads-only, writes-only, read-or-write). Atomic ref- 
erences (e.g., SWAP/LDSTUB) are considered both read and writes. 


The action-upon-event control register (ACTION) specifies whether the ad- 
dress breakpoint should generate an exception or interrupt or activate the ex- 
temal strobe (ESB) pin. The action-on-event register is defined in Subsection 
15.2.5. 
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15.1.2 Counter Breakpoints - Code (Instruction) or Cycle 


A single 32-bit-wide contro! register programs the two 16-bit counters for in- 
struction and cycle counts at the same time. It is not possible to modify one 
counter without the other. 


r—————M————— 


Note: 


A maximum of one instruction count breakpoint and one cycle count break- 
point can be active at a given time. 


ю—— M M MÀ | 
The 16-bit cycle counter will count up to about 1.3 milliseconds at 50 MHz. 
Longer-duration counters must be simulated in software by accumulating un- 
derfiows of the 16-bit counter into a larger counter in memory. 


The instruction counter can count either faster or slower than the cycle count- 
er, depending on the execution characteristics of the processor. it could theo- 
retically count three times faster, if SuperSPARC were continuously executing 
three instruction groups. In general, it will count slightly faster than the cycle 
counter. 


The combination of these two counter interrupts can be used to calculate the 
dynamic performance (іп million instructions per cycle (MIPS) of the executing 
program. 


Cycle counter underflow events are always reported to the last valid instruction 
in the current instruction group. Instruction counter events occur as interrupts 
or scan-based debug requests to the instruction after the instruction that 
causes the counter event. Instruction counter expiration events will be re- 
ported after the first instruction that causes the ипдетом. 


Once set, the counter expiration event is persistent until served. The instruc- 
tion that caused the counter event will ultimately be restarted. 


When a cycle counter expires, action for that event may be deferred until valid 
instructions are available and the pipeline is able to progress. И the action is 
deferred, the request (interrupt or scan-based debug) will persist until the pipe- 
line is able to continue. Once the initial event is signalled, the cycle counter 
continues counting down through the most positive number. In this way the to- 
tal number of elapsed cycles from a given point may be calculated. 


The instruction counter decrements the existing instruction count by the num- 
ber of instructions that complete execution in a given cycle. The number can 
vary anywhere from 0 to 3 instructions per cycle. The cycle counter always 
decrements by 1. 


15.1.3 Setting Up Breakpoints 


Table 15-1 describes how to set up breakpoints using the proper control regis- 
ters and where to find information that the actual action(s) has/have been trig- 
gered after the breakpoint. 
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Table 15—1.Breakpoints—Control and Status 


5j Address Breakpoints 


КЕКЕ хкхихяяних [к [ЕЕЕ ыыы 
А темя [т хотхх о] S x nex РЕ [вы Dos o s 
Е ту от ххх] e x Dex] s ess ETIT 
"eme rae ПЕ СО ew eve 


== Ееее из [планини 
| emulation reg] x x x x x x x| x x x x x x xx хо | х [оо [оороо vo v 1| 
ШЕТ КИЗИ ИЗЕЛҮ METE 





1 5= Set BCIPL to select IRL; x = болі care: E = Event Dependent: u = unchanged 


in Table 15-1, the notations S, x, and u are self-explanatory. The notation E 
(which is given to an affected status bit) indicates that the bit is Event-depen- 
dent, meaning that the bit status after a breakpoint depends on whether a par- 
ticular event has occurred. For example, the BKS.DBKIS bit in row data ad- 
dress breakpoint ESB, which is designated the notation E, will be set if AC- 
TION.IEN DBK and ACTION.BCIPL were set when the breakpoint occurred. 
(The latter two bits are designated "don't cares"). 


For all the scan-based debug request cases that are initiated by breakpoints, 
the MCMD.INITM bit must be set. See Chapter 22. 
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When setting a breakpoint, the programmer must allow for the latency until the 
breakpoint can be guaranteed active. To achieve this, the last STA 0x38 in- 
struction that sets the breakpoint registers must be followed by an IFLUSH or 
a BA instruction. 


Also, after the breakpoint has been taken, the programmer must explicitly dis- 
able (by disabling the appropriate control bits) that particular breakpoint, so 
that, when normal execution is resumed, the same breakpoint will not be taken 
again. 


15.1.4 Priorities of Debug Interrupt and Exception 


All address breakpoint interrupts and counter ипдетом interrupts use a com- 
mon user-programmable interrupt priority level (ACTION.BCIPL—see Sub- 
section 15.2.5). Status bits are provided to differentiate between these inter- 
rupt sources. The following describes the priority order that these interrupts 
and exceptions adhere to. 


1) Ifmultiple breakpoint fault events (code, data) are signalled simultaneous- 
ly, both code and data breakpoint status registers will set their fault status 
bits. 


2) If multiple breakpoint interrupt events are signalled simultaneously, each 
activity will set its interrupt status bits. 


3) Exception sources from multiple instructions are prioritized based on in- 
struction order. An exception reported to an instruction will have higher 
priority than a simultaneous exception to the next instruction. 


4) Asynchronous data access exceptions are honored before instruc- 
tion access, exceptions. 


5) instruction access exceptions are honored before synchronous 
data access exceptions, and all data, access, exceptions are honored 
before interrupts. 
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15.2 Counters and Breakpoints 


15.2.1 MMU Breakpoint Control Registers ASI=0x38 


There are four memory-mapped, SuperSPARC-specific, MMU-breakpoint 
diagnostic registers (see Table 15-2). These registers are double-word ac- 
cess only; any other size will cause a data access exception. All breakpoint 
enable and status bits are cleared at reset. All values and masks are un- 
changed through reset. 


A single-address breakpoint is controlled by these four registers. The address 
breakpoint may be set in code space or data space, but not both simultaneous- 
ly. 


An Address Breakpoint can: 
Q Generate an instruction or data access breakpoint exception. 
ГД] Generate an instruction or data address breakpoint interrupt. 


Q Generate an instruction or data address breakpoint scan-based debug re- 
quest. 


[) Enable the ESB pin. 
Г] None of the above. 


Theresponse selected is determined by the breakpoint control register (BKC), 
the ACTION register, and the JTAG MCMD scan register. More details on the 
JTAG МОМО register can be found in Chapter 22. 


Instruction and data address breakpoint shares the same breakpoint register 


set The address map for these MMU diagnostic registers is shown in 
Table 15-2. 


Table 15–2. MMU Diagnostic (Breakpoint) Registers 


ову ерон име (ма) [ват | 
| в __| реасотма [| 5212 — 
= | в __| шиити Сона | 35213 | 
Ls [кв [ева [вам | 


These registers are defined as follows: 
















15.2.1.1 Breakpoint Value Register (BKV) 
This register defines the code or data breakpoint address value. 
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Counters and Breakpoints 


Figure 15—1. Breakpoint Value Register (BKV) 


вва ву 

63 35 0 

геѕ Reserved. These bits are ignored on write and read 
as zero. 

BKV Breakpoint Value. Contains the 36-bit value with 


which either physical or virtual address of the code or 
data being accessed will be compared (as deter- 
mined by BKC.CSPACE in Subsection 15.2.1.3). 
When a match occurs, an event is generated de- 
pending on the state of the BKC and ACTION regis- 
ters. These actions are also affected by the JTAG 
MCMD.INITM bits (not programmer-visible). 


15.2.1.2 Breakpoint Mask Register (BKM) 
This register defines a mask value useful for address-matching across a 
range. 

Figure 15-2. Breakpoint Mask Register (BKM) 
aa. ae ee. Ири 
63 35 


0 


res Reserved. These bits are ignored on write and read 
as zero. 


BKM Breakpoint Mask. This field defines a per-bit compar- 
ison mask for the BKV field. For any bit that is set in 
BKM, the equivalent bit in BKV is ignored in the ad- 
dress comparison. This can be used to match on 
ranges of addresses. 


15.2.1.3 Breakpoint Control Register (BKC) 


This register controls whether the breakpointis to be set for code or data space 
access, and whether to compare a physical or virtual address. It also controls 
the enable for different types of breakpoints. All bits are cleared on both hard- 


ware and watchdog reset. 
Figure 15-3. Breakpoint Control Register (BKC) 
|_______гез ______| СЅРАСЕ | РАМО | CBFEN | CBKEN | DBFEN | DBREN | DBWEN | 
63 6 4 3 2 1 0 
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res Reserved. This field is ignored on writes and read as 
zero. 
CSPACE Code Space Address. If CSPACE=1, the address in 


the BKV register is compared to code space address. 
DBREN, DBWEN and DBFEN are ignored. If 
CSPACE=0, the address in the BKV register is com- 
pared to data space address. CBKEN and CBFEN 
are ignored. 


PAMD Physical Address. # PAMD=1, the physical address 
is compared (BKV[35:0]). If PAMD=0, virtual address 
is compared (BKV[31:0]), ignore BKV[35:32]). 


CBFEN Enable Code Breakpoint Fault Generation. When 
this bit is set, a code breakpoint match will cause an 
instruction access exception. When this bit is 
cleared, an interrupt as defined in ACTION will be re- 
ported (see Subsection 15.2.5, Action Register). 


CBKEN Enable Code Breakpoints. If disabled, no code 
breakpoint can occur. 
DBFEN Enable Data Breakpoint Fault Generation. When this 


bit is set, a data breakpoint match will cause a 
data access exception. When this bit is cleared 
(and either DBREN or DBWEN is set), an interrupt as 
defined in ACTION register will be reported (see Sub- 
section 15.2.5, Action Register). 


DBREN Enable Data Breakpoints for Read Transactions. 
This bit must be set (with CSPACE cleared) for data 
read breakpoints to occur. 


DBWEN Enable Data Breakpoints for Writes Transactions. 
This bit must be set (with CSPACE cleared) for data 
write breakpoints to occur. 


15.2.1.4 Breakpoint Status Register (BKS) 


This register reports on the status of either code or data breakpoints. Any type 
of reset sequence or any load ASI from thís register will clear all the bits in this 


register. 
Table 15-3. Breakpoint Status Register (BKS) 
| CBKIS | | DBKIS | DBKFS | 
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CBKIS 


CBKFS 


к, DBKIS 


DBKFS 


Counters and Breakpoints 


Code Breakpoint interrupt. Indicates that an interrupt 
was generated as a result of a code breakpoint. 


Code Breakpoint Access Exception. Indicates that a 
code access exception was generated as a result of 
a code breakpoint. This bit is also set on an instruc- 
tion access, exception. 


Date Breakpoint Interrupt. Indicates that an interrupt 
was generated as a result of a data breakpoint. 


Data Breakpoint Access Exception. Indicates that a 
data access exception was generated as a result of 
a data breakpoint. This bit is also set on a data, ac- 
cess, exception. 


15.2.2 Counter Breakpoint Value Register (CTRV) ASI=0x49 


The СТАУ register is a register that holds the counter breakpoint values and 
is composed of the following two 16-bit fields: 


Figure 15~4. Counter Breakpoint Value Register (CTRV) 


31 


CCNT 


CCNT 


15 0 


Instructions Left. This field specifies the number of in- 
structions left before the next counter event. (See 
Note below.) 


Cycles Lett. This field specifies the number of cycles 
left before the next counter event. (See Note below.) 


Neither counter is decremented when SuperSPARC is in scan-based debug 
mode. {СМТ and CCNT are read and written as a pair and are unchanged at 


reset. 
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Note: 


The total number of instructions remaining to execute before the next count- 
er event is given by the value held in CTRV.ICNT (readable by LDA 0x49) 
plus the number of instructions in the group prior to the group containing the 
LDA 0x49 instruction. If the group prior to the group containing the LDA 0x49 
instruction has zero instructions, the value held by CTRV.ICNT reflects the 
exact number of instructions remaining to execute before the next counter 
event. An example is to use JMPL and BA branching to the LDA 0x49, which 
will force a bubble (hence zero instruction group) between the BA and the 
LDA. Another example might be using a.sequential instruction before the 
LDA 0x49 or other methods that guarantee a group with a known number of 
instructions, allowing the programmer to get an exact instruction count until 
the counter event. Similarly, the number of cycles remaining before the next 
counter event is always one more than the value held in CTRV.CCNT (read- 
able by LDA 0x49). 





15.2.3 Counter Breakpoint Contro! Register (CTRC) ASI=0x4a 


This register enables either the instruction counter or cycle counter. The num- 
ber of instructions or cycles counted is specified in the CTRV register (see 
Subsection 15.2.2). 


0 


31 1 


8 


Reserved. This field is ignored on writes and read as 
zero. 


ICNTEN Enable Instruction Counter. This bit, when set, will 
enable the instruction counter in the CTRV register. 
This bit is cleared on either a power-on reset or 


watchdog reset. 


CCNTEN Enable Cycle Counter. This bit, when set, will enable 
the cycle counter in the CTRV register. This bit is 
cleared on either a power-on reset or watchdog re- 
set. 


15.2.4 Counter Breakpoint Status Register (CTRS) ASI=0x4b 
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This register records the status of either the instruction counter or cycle count- 
er. it will tell whether an interrupt was generated or scan-based debug mode 
was entered when the cycle or instruction counter overflow occurred. 
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Figure 15—5. Counter Breakpoint Status Register (CTRS) 


31 1 0 

res Reserved. This field is ignored on writes and read as 
zero. 

ZICIS Instruction Counter Interrupt Status. Records the sta- 


tus of whether an instruction counter interrupt was 
generated. This bit is not set when a scan-based de- 
bug request is induced by this event and is cleared on 
either a power-on reset or watchdog reset. 


ZCCIS Cycle Counter Interrupt Status. Records the status of 
whether a cycle counter interrupt was generated. 
This bit is not set when a scan-based debug request 
is induced by this event and is cleared on either a 
power-on reset or watchdog reset. 


The SSP can read and write the ASI memory-mapped counter breakpoint sta- 
tus register. It will implicitly be written (set) by action register enabled counter 
events. 

15.2.5 Breakpoint ACTION Register ASI=0x4c 


The breakpoint ACTION register specifies what the SSP will do when a break- 
point occurs. It also specifies the interrupt level for breakpoint interrupts and 
the multiple instruction per cycle (superscalar) execution mode. 


Figure 15—6. Breakpoint Action Register (ACTION) 


тез | Mx BCIPL ЈЕ свк | E ДС [Е ОВК ЈЕ ZCC || Сек | LziC || ОВК | Lzcc | 
12 11 8 7 6 5 4 3 2 1 0 


res Reserved. This field is ignored on writes and reads as 
zero. 
MIX Enable Muitiple-Instruction-Per-Cycle Execution 


Mode. When cleared, SuperSPARC will issue no 
more than one instruction every cycle. This bit is 
Cleared at reset. 


ә ө ÓVn—————————————————— | 


Note: 


In order for the SuperSPARC processor to execute as a superscalar proces- 
sor, ACTION.MIX must be set. If this field is not set, SuperSPARC will only 
execute a single instruction per cycle. 
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BCIPL Breakpoint Interrupt Level. Determines the interrupt 
level to be used for all breakpoint or counter inter- 
rupts. When generated, these interrupts behave the 
same as extemal interrupts—they must meet 
SPARC criteria for PIL and IRL to be seen in the ex- 
ecution stream. This field is cleared at reset. 


E CBK ESB Strobe on Code Address Breakpoint. Enables 
the ESB pin to strobe on a code address breakpoint. 
This bit is cleared at reset. 


E ZIC ESB strobe on Zero Instruction Count. Enables the 
ESB pin to strobe on a zero instruction count expira- 
tion. This bit is cleared at reset. 


E DBK ESB strobe on Data Address Breakpoint. Enables 
the ESB pin to strobe on a data address breakpoint. 
This bit is cleared at reset. 

E ZCC ESB strobe on Zero Instruction Count. Enables the 


ESB pin to strobe on a zero cycle count expiration. 
This bit is cleared at reset. 


| CBK Enable Interrupt on Code Breakpoint. Enables gen- 
eration of an interrupt in response to code break- 
point. This bit is cleared at reset. 


I| ZIC Enable Interrupt on Zero Instruction Count. Enables 
generation of an interrupt in response to zero instruc- 
tion count. This bit is cleared at reset. 


і DBK Enable Interrupt on Data Breakpoint. Enables gener- 
ation of an interrupt in response to data breakpoint. 
This bit is cleared at reset. 

| ZCC Enable interrupt on Zero Cycle Count. Enables gen- 


eration of an interrupt in response to zero cycle count 
event. This bit is cleared at reset. 


If the event is to generate a scan-based debug request, the associated IEN bit 
must be cleared. When an interrupt on these events is desired, the associated 
IEN bit must be set. 
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15.3 External Monitors 


15.3.1 PIPE Pins 


The SSP is a highly integrated processor. SuperSPARC can execute code 
continuously from its internal cache with no extemal indications. This can 
make system debugging very difficult. 


Several features are provided to reduce these problems. Scan-based debug 
provides access to the internal state of the machine. 


Additional pins have been defined to provide cycle-by-cycle observation of key 
intemal states (PIPE[9:0]). These pins provide information on activity within a 
dockcycle. The ESB signal has been provided as an external trigger for debug 
equipment. 


Scan-based debug is discussed in more detail in Chapter 22. The PIPE pins 
and ESB pin are discussed in the following sections. 


The PIPE[9:0] pins provide cycle-by-cycle information on the following events: 
The number of instructions that complete execution. 


When branches and memory operations occur (including an indication of 
taken branches). 


When floating-point instructions are issued. 


DL LL 


When the pipeline is being held by either floating-point operations or 
memory operations. 


С Interrupts and exceptions. 
The definitions of these signals are provided below: 


PIPE[9] Active when any valid memory reference occurred in 
the EO stage of the previous clock cycle. 

PIPE[8] Active when any valid floating point operation oc- 
curred in the EO stage of the previous clock cycle. 

PIPE[7] Active when any valid contro! transfer instruction was 
executed in the EO stage of the previous clock cycle. 

PIPE[6] Indicates that no instructions were available when 
the group currently at the WB stage was decoded 
(DO). 

PIPE[5] Active when the pipeline is being held by the Data 


Cache. (Generally processing a cache miss). 
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15.3.2 ESB 
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PIPE[4] Active when the pipeline is being held by the Floating 
Point Unit (FPU). (Either queue is full, or dependen- 
cies). 

РІРЕ[3] Indicates that the branch in EO stage of the previous 


cycle was taken (1) or not taken (0). 


РІРЕ[2:1] Indicates the number of instructions in the EO stage 
of the current cycle: 00=None, 01-1, 1022, 11-3. 


PIPE[O] Indicates that an exception or interrupt is being sig- 
nalled in the current cycle. 


The breakpoint registers may be programmed to activate the ESB pin when 
a breakpoint is detected. This allows an external device to be triggered (oscil- 
loscope, logic analyzer, etc.). 


The ESB signal will be asserted when entering scan-based debug mode via 
acode breakpoint, data address breakpoint, zero instruction count breakpoint, 
or a zero cycle count breakpoint, and when the appropriate bits in the ACTION 
register are set. If you are not using scan-based debug mode, assert the ESB 
signal by setting up an interrupt on either a code address breakpoint, data ad- 
dress breakpoint, zero instruction count breakpoint, or a zero cycle count 
breakpoint. 
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Extemal Monitors 


Following is an algorithm to enable the ESB signal at a code address or data 
address breakpoint. 


1) 


2) 
3) 


4) 


5) 
6) 


Execute ап STDA of the appropriate value to the ACTION register. 
№ Code Address Breakpoint 


To set up a request for ESB assertion on scan-based debug mode 
entry for a code address breakpoint, the ACTION.E CBK bit must be 
set. The other bits inthe ACTION register, except for the ACTION.MIX 
field, must be cleared. The value of the ACTION.MIX field will depend 
on your code. To include an interrupt, ACTION.E_CBK and АС- 
TION.I СВК must also be set. 


Ш Data Address Breakpoint 


To set up a request for ESB assertion on scan-based debug mode 
entry for a data address breakpoint, the ACTION.E DBK bit must be 
set. The other bits in the ACTION register, except for the ACTION. MIX 
field, must be cleared. The value of the ACTION. MIX field will depend 
on your code. To include an interrupt, ACTION.E DBK and AC- 
ТОМ ОВК must also be set. 


Set the breakpoint value register to the appropriate breakpoint address 
value. 


Set the breakpoint mask register to the appropriate mask value. 


Set the breakpoint control register to generate an interrupt or scan-based 
debug request on the code or data address match. The bits to set will de- 
pend on your application. (See Subsection 15.2.1.3 for more information.) 


Clear the breakpoint status register. 
Execute the code until the breakpoint occurs. 
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Following is an algorithm to enable the ESB signal at a zero instruction count 
or zero cycle count breakpoint. 


1) 


2) 


4) 
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Execute an STDA of the appropriate value to the ACTION register. 


Zero Instruction Count Breakpoint 


To set up a request for ESB assertion on scan-based debug mode 
entry for a zero instruction count breakpoint, the ACTION.E_ZIC bit 
must be set. The bits in the ACTION register, except for the AC- 
TION.MIX field, must be cleared. The value of the ACTION.MIX field 
will depend on your code. To include an interrupt, ACTION.E  ZIC and 
ACTION.I, ZIC must also be set. 


Zero Cyde Count Breakpoint 


To set up a request for ESB assertion on scan-based debug mode 
entry for a zero cycle count breakpoint, the ACTION.E ZCC bit must 
be set. The other bits in the ACTION register, except for the AC- 
TION.MIX field, must be cleared. The value of the ACTION.MIX field 
will depend on your code. To include an interrupt, ACTION.E ZCC 
and ACTION.I, ZCC must also be set. 


Setthe counter breakpoint value register to the appropriate counter value. 


Setthe counter breakpoint control register to enable either the instruction 
or cycle counter. 


Clear the counter breakpoint status register. 


Execute the code until the breakpoint occurs. 
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The MultiCache Controller (МХСС) is an optional external cache controller for 
SuperSPARC Processors (SSPs). It is employed when a large secondary 
cache or an interface to a non-MBus system is required. 


Topic 
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16.1 Introduction 


TheSuperSPARC МХСС aids in building high-performance systems using the 
SSP. It provides a number of functions in the system: 


(а Selectable bus interface for greater freedom in configuring systems. 


C 


External (second-level) cache controller and tags for 1-Mbyte (MBus) or 
up to 2-Mbytes (XBus). 


Multiprocessor support, including cache consistency. 
Block copy and block fill controller. 

Interrupt control functions (mostly in XBus configurations). 
Cache block prefetching. 

BootBus (XBus configurations only). 

Memory mode! support (XBus configurations). 

VBus arbitration and control (see Chapter 18). 


ооооваиојсбо 


JTAG boundary scan and intemal test functions. 


16.1.1 Configurations 


The MXCC offers several degrees of freedom for the design of systems using 
the SSP, as shown in Table 16-1. 


Table 16—1.SuperSPARC Chipset Configurations 











Г = 

а | 
ма | 
Готан нва | | 3 | | [Wh | 


The high-performance MBus system is diagrammed in Figure 16-1. The ex- 
temal cache memory provides significant performance improvement and 
greatly decreases bus traffic in order to support more processors on an MBus. 
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Figure 16-1. High-Performance MBus System 









MultiCache 
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Level il 
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SuperSPARC 


Ifthe system bus is not an MBus, the MXCC supports extemal system bus in- 
terfaces in its XBus configurations. The external system bus interface mates 
the MXCC to a particular system bus. External system bus interfaces are 
referred to as bus watchers (BWs). Figure 16-2 shows the high-performance 
XBus system configuration. 


Figure 16-2. System With External Bus Watchers 
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The XBus configurations support up to four extemal BWs that can be used with 
several system buses to increase available bandwidth between processors 
and memory. 


Extemal cache RAM may not be needed in every application. Configurations 
without external cache are listed as minimum configurations in Table 16-1. 


The external cache (E-cache) for SuperSPARC is alarge secondary physically 
addressed cache controlled by the MXCC. The MXCC stores the tags for E- 
cache internally and uses extemal synchronous SRAM chips for data storage. 
Each SRAM receives the SSP's clock, PCLK. 


Synchronous SRAM 


Bus Connection 


164 


Synchronous SRAMs have registers on each input and output. This allows 
pipelined operation. An address is presented to an SRAM before the active 
clock edge; the address is registered in the SRAM at the clock edge. The 
SRAM reads out the addressed location before the next active clock edge, and 
the result is stored in an output register at the edge. New addresses can be 
supplied at each clock edge. New outputs appear after two clock periods of 
delay. Writing works similarly. Address, data, output enable, and write-enable: 
аге registered on the active clock edge and stored into the intemal array during 
the subsequent clock period. 


The E-cache is organized as a direct-mapped cache with a normal size of 1 
Mbyte. This configuration is implemented with eight 128Kx8 or 128Kx9 syn- 
chronous SRAMs. The 128K x 9 SRAMs are needed to implement byte parity 
onthe E-cache data storage. Parity is directly supported by both SuperSPARC 
and the MXCC. 


The MXCO supports a direct connection to a Level 2 MBus (see Chapter 17). 
Alternatively, the XBus may be selected (see Chapter 19). XBus connects 
MXCC to an extemal BW that interfaces to different types of system buses. 
XBus supports up to four BWs. 


The MXCC is configured for either the MBus or the XBus operation with the 
MBSEL pin. A number of other parameters are altered by the choice of bus 
configuration. These are summarized in Table 16-2. 
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Table 16—2. Configuration Changes With MBSEL Pin 


MBus interface | XBus Interface 

MBSEL-H MBSEL-L 
вов [вв fese | 
Cacho Szo [18 _ 
вос 


interrupts from pins from XBus 
LN NEN 
16.1.2 Cache Tags and Control 


The MXCC contains tags for an E-cache that acts as а second-level cache in 
the system. The MXCC also controls accesses to the cache: sequencing ac- 
cesses, handling E-cache misses, selecting blocks for replacement, and han- 
dling snoop requests from the bus interface. The cache tags can support a di- 
rect-mapped external cache of 1 Mbyte in an MBus configuration or up to 2 
Mbyte in an XBus configuration. 


The intemal caches inthe SSP obey ínclusionwith respect to the E-cache. Any 
data in the intemal cache must also be present in ће E-cache. Since only the 
E-cache tags need be examined to determine whether a snoop hits in any of 
the three caches, bus snooping is greatly simplified. In order to enforce indu- 
sion, whenever data is removed from the E-cache due to either an invalidation 
caused by a snoop hit or a block replacement, the same block must also be 
invalidated in the internal instruction cache and the intemal data cache wher- 
ever it may reside. 


When the MXCC is used in the MBus configuration, it supports either no E- 
cache or 1 Mbyte of E-cache. When the МХСС is used in an XBus configura- 
tion, it supports a variety of E-cache sizes: none, 512 Kbyte, 1 Mbyte, or 2 
Mbyte. 


The E-cache is blocked and sub-blocked. The sub-block size is 32 bytes in 
MBus configurations and 64 bytes in XBus configurations. There are four sub- 
blocks per block. Block size is 128 bytes in MBus configurations and 256 bytes 
in XBus configurations. 


The МХСС supports pipelined access to the E-cache from the SSP. The peak 
data rate that can be achieved is one double-word (DW) every cycle for either 
read and write. 


The MXCC can handle one read miss and one write miss at any given time. 
In addition, some system buses can have more than one access in progress 
(XBus configurations). 
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The E-cache tags are accessible for diagnostic access through contro! space. 
The E-cache data is also accessible through control space, as are the MXCC 
control registers that control the behavior of the cache controller and bus inter- 
faces. 


16.1.3 Multiprocessor Support 


The MXCC has extensive multiprocessor support. In MBus configurations, the 
МХСС contains all functions of a snooping-coherent cache and bus master. 
See Chapter 17 for more information on the MXCC's behavior on MBus. 


In XBus configurations, the MXCC and BWs cooperate to support cache co- 
herence and high-performance multiprocessing in the system environment. 
See Chapter 19 for more information on XBus cache consistency and proto- 
cols. 


16.1.4 Block Copy and Block Fill 


The MXCC can perform high-performance block copy and block fill operations 
in cooperation with the SSP. The MXCC's buffer can be loaded with a high-per- 
formance block read on the system bus and stored with a high-performance 
block write. The buffer holds а singie-cache sub-block, 32 bytes in MBus con- . 
figurations, and 64 bytes in XBus configurations. 


Sequences of load block and store block operations can be initiated in rapid 
succession by the SSP to perform a block copy at the system bus's burst trans- 
fer rate. 


Block fill can also be performed by loading the buffer with the fill pattern and 
by performing a series of block writes. 


16.1.5 Interrupt Control Functions 
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The MXCC directly drives the IRL[3:0] pins of the SSP. 


In MBus configurations, the system presents interrupt requests to the MXCC/ 
SSP pair via МХСС'5 IRL[3:0] pins. No additional interrupts are generated in- 
temally inside MXCC. 


In XBus configurations, the system presents interrupt requests to the MXCC/ 
SSP pair via XBus packets. In addition, the MXCC generates level-15 inter- 
rupts to the SSP to report asynchronous errors. See Section 16.7 for more de- 
tails. 
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16.1.6 Cache Block Prefetching 


16.1.7 Boot Bus 


The МХСС supports prefetching of sub-blocks into the E-cache in order to re- 
duce the effective memory latency during sequentia! accesses to memory. 
Prefetching is triggered when the MXCC services a miss from the SSP's inter- 
nal instruction or data cache and the next (sequential) sub-block within the E- 
cache block is not in the E-cache. 


Prefetching is controlled by a bitin the CCCR. See the description of CCCR.PF 
in Subsection 16.5.3. 


In XBus configurations, the MXCC supports an eight-bit bus for communica- 
tion with local peripherals and boot ROM. See Chapter 20 for more details. 


16.1.8 Memory Model Support 


The MXCC, if in an XBus configuration, can have several outstanding transac- 
tions in the system. The MXCC keeps track of outstanding transactions. To aid 
in supporting the PSO memory model, MXCC asserts PEND whenever it has 
а store operation pending in the system. 


See Section 8.7. 


16.1.9 VBus Arbitration and Control 


The MXCC acts as the VBus arbiter when used with the SSP. MXCC controls 
the HGNT and WGNT signals in order to control access to the VBus by the 
SSP. 


See Chapter 18. 


16.1.10 JTAG Boundary Scan and Test 


The MXCC has full boundary scan accessible by the JTAG test access port. 
In addition, intemal logic is testable via the JTAG port. See Section 21.6, for 
more details. 
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16.2 MXCC Block Diagram and Basic Functionality 
Figure 16-3. MXCC Block Diagram 


REQUEST 
QUEUE: 


|| MB 
Interface 

n XBus 
Arbiter 


Interíace 





The МХСС can be partitioned into five large functional blocks: 
Г] Cache Controller Core. 


This includes the E-cache tag memory, the processor command logic, and 
the bus command logic. 


С] SSP Interface. 
This includes the processor bus interface and arbitration logic. 
С MBus/XBus Interface. 


This includes the MBus interface, the XBus interface, and the XBus arbi- 
ter. 


16-8 MultiCache Controller (MCC) 


Subject to Change Without Notice 


MCC Block Diagram and Basic Functionality 





Г] Queues and Synchronizers. 


This includes the input and output queues (input queue and the request 
and reply queues) and their synchronizers. 


С} BootBus Interface. 


There are two clock domains on the MXCC. These are delineated by the 
vertical dotted line in Figure 16-3. The side connected to the SSP runs 
from the processor clock (PCLK), while the side connected to the external 
bus runs at the bus clock (BCLK). Special provisions allow for synchro- 
nous operation when the two clocks are the same; see Chapter 23 for 
more details. 


16.2.1 Cache Controller Core 


The cache controller core is comprised of the E-cache tag memory, the proces- 
sor command logic, and the bus command logic. The E-cache tag memory 
keeps track of the usage of the external cache. The tag memory is organized 
as 8K x 33-bit memory and supports 8K blocks, with four sub-blocks per 
block; 33 bits of information are kept for each block. These record the address 
tag (up to17 bits) and four status bits for each sub-block (16 bits). In MBus con- 
figurations, sub-blocks are 32 bytes; 1 Mbyte of E-cache is supported. in XBus 
configurations, sub-blocks are 64 bytes; up to 2 Mbyte can be supported. 


In the MBus configurations, the E-cache tag memory is used for both E-cache 
access and bus snooping. When used with XBus, the tag memory is used only 
for E-cache accesses. Snooping in the XBus configuration is accomplished by 
a bus watcher that provides an interface between XBus and a system bus. 


The processor command logic is a finite state machine that handles incoming 
processor commands. If a processor requires access to either MBus or XBus, 
the command logic generates a bus command through the request queue. The 
command logic also deals with acknowledgements of bus requests and, in the 
case of reads, delivers the requested data. 


The bus command logic handles all the requests from the bus in the input 
queue, then places replies to them in the output queue. Together with the 
MBus interface, it implements the MBus cache-consistency protocol. The bus 
command logic, when combined with a suitable bus watcher on XBus, can pro- 
vide other cache-consistency protocols. In response to an external bus re- 
quest, the command logic may access E-cache tag memory, the E-cache 
RAMs, and/or the processor. 
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16.2.2 SSP Interface 


The MXCC interfaces to the SSP ма VBus (see Chapter 18). MXCC's SSP in- 
terface provides this interface and consists of the processor bus interface and 
arbitration logic. 


The processor bus interface provides the bus command logic with the illusion 
of a free SSP. Thus the bus command logic can issue writes to E-cache with 
no arbitration. The interface logic uses a buffer to store up to nine cycles of 
VBus accesses from the bus command logic. Since the interface logic buffers 
VBus accesses, VBus arbitration is hidden from the bus command logic. 


The processor bus interface logic latches in all the signals from the processor 
(except for a few control strobes) before using them in the MXCC. This logic 
aiso latches ай the output signals before driving them out. 


The processor arbitration logic is responsible for arbitrating the usage of VBus 
among the SSP, the processor command logic, and the bus command logic. 
Read and write cycles are arbitrated separately on VBus via separate read 
grant and write grant lines to the processor. 


16.2.3 MBus/XBus Interface 
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The MBus/XBus interface is composed of the MBus interface, the XBus inter- 
face, and the XBus arbiter. MXCC operates either the MBus interface or the 
XBus interface, as selected by the MBSEL pin. When MBSEL is high, the 
MBus interface is selected. 


The MBus interface handles the MXCO's interface to a system bus using the 
MBus level 2 protocol (see Chapter 17). The MBus interface receives com- 
mands generated by the processor command logic in the request queue and 
initiates MBus transactions. The replies from these transactions are placed in 
the input queue for the bus command logic. 


All requests on the MBus, except for non-cacheable accesses, are put into the 
input queue to send to the bus command logic for snooping. For non-cache- 
able accesses, the MBus interface decodes the address and places only ac- 
cesses to this module in the input queue. The MBus interface takes the reply 
to a pending MBus request, either one addressed to this module or a snoop 
request, from the reply queue and delivers it to the MBus. 


The XBus Interface handles the MXCC's side of the XBus protocol (see Chap- 
ter 19). The XBus Interface takes a command generated by the processor 
command logic from the request queue and initiates an XBus transaction. 
When a reply packet is retumed on the bus, the interface puts it into the input 
queue. VBus transactions initiated by the processor are processed by the pro- 
cessor command logic and sometimes generate a request by placing an entry 
in the request queue. 
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All requests on XBus addressed to the MXCC are put into the input queue to 
send to the bus command logic. The XBus interface takes the reply of a pend- 
ing XBus request from the reply queue and delivers it to the XBus. 


The XBus arbiter controls access to the XBus among the MXCC and the BWs. 
Up to four BWs are supported by the XBus arbiter. The bus is granted accord- 
ing to arrival order and priority of bus request signals from the BWs. (See Sec- 
tion 19.10.) 


16.2.4 Queues and Synchronizers 


The input queue, request queue, and reply queue are first-in, first out (FIFO) 
queues used to communicate between the two clock domains. They are imple- 
mented with dual-port register files. The input queue is a 16 x 73-bit buffer, 
while the request queue is made up of two 10 x 65-bit buffers. The reply queue 
consists of three 10 x 65-bit buffers. The input queue is read with PCLK and 
written with BCLK, while the Request and Reply Queues are read with BCLK 
and written with PCLK. 


Control strobes are sent between the two domains through synchronizers. The 
synchronizers can be defeated for synchronous operation, where PCLK and 
BCLK are the same (see Chapter 23). 


16.2.5 Boot Bus Interface 


The boot bus interface handles access to boot bus. The BootBus interface im- 
plements the address and data multiplexing functions to the boot bus, along 
with the automatic polling of interrupts (see Chapter 20). 
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16.3 Control Space Access 


The SSP's ASI 0x02 allows access to the information in an external device—in 
this case, the MXCC—via VBus's control space. These accesses аге 
non-cacheable, regardless of MCNTL.AC indication. SuperSPARC refer- 
ences to this control space may be any size and are properly aligned on the 
VBus. Апу fault is reported as a data access, exception. 


The MXCC's control space occupies different addresses, depending on 
whether the МХСС is in an MBus or XBus configuration and whether it is ad- 
dressed from the system bus or the processor. Table 16-3 shows control 


space base addresses for processor and bus access from MBus and XBus 
configurations. 


Table 16-3.MXCC Control Space Base Addresses 


Base Address 


E ignored 
ADDR[24] = 1 


MBus MAD[35:24] = OxFFn 
nis MID[3:0} 
XBus PA[35:25] ignored 
PA[24] = 1 
The E-cache data, E-cache tags, and the MXCO's registers are all accessed 


via control space. The address within control space for each of these is shown 
in Table 16-4. 









Table 16—4.MXCC Control Space Address Decoding 


Siring Ose 


E-cache Data 0х000000 РА[23] = 0 Figure 16-5 
Па aa el 
E-cache Tags 0x800000 PA[23:22] = 10 Figure 16-6 
ee Le И _ 


MXCC Regis- 0хС00000 PA[23:22] = 11 Figure 16-8 
ters 
PA[21:20] reserved 


РА = Physical Address 
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Any illegal access by the processor to the MXCC control space generates a 
timeout (TO) bus error. Illegal access includes any illegal address, atomic 
load-storeto any MXCC register, or writes to read-only registers. Illegal access 
to the MXCC from the MBus or XBus is ignored, and it is the system's responsi- 
bility to generate timeout responses. 
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16.4 External Cache (E-Cache) 


The MXCC controls SuperSPARC’s E-cache. E-cache is a dírect-mapped 
cache; there is a single cache location at which any particular byte of the physi- 
cal address space can reside in cache. Many different bytes have the same 
place in the cache. E-cache is a copy-back cache; writes into the E-cache do 
not propagate to main memory until the cache block is replaced. Figure 16-4 
shows the organization of the external cache memory. 


Figure 16-4. E-Cache Organization 


8192 





s sets of sub-block tags 


Sub-block is: 
32 bytes MBus СТГ 
64 bytes XBus 


Direct mapped (one-way associative) 


With the MXCC on MBus, the external cache can store 1 Mbyte. With MXCC 
on XBus, the E-cache can be configured to store 2 Mbytes, 1 Mbyte, or 512 
Kbytes. See Table 16-5 for CCCR bit settings for various E-cache sizes. The 
width and position of many fields for E-cache access depend on the configura- 
tion of the E-cache. 
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Table 16-5.MXCC Effective Size of E-Cache 





The E-cache is a unified cache; it combines instructions and data in a single 
cache and supplies both data and instructions to the processor. The MXCC 
maintains the inclusion property on SuperSPARO's on-chip caches with re- 
spect to the E-cache, which means that there will be no cache block in either 
of SuperSPARC's intemal caches that is not also in E-cache. Inclusion 15 main- 
tained by allocating a block in the E-cache for any cacheable data accessed 
by the processor, and by invalidating the corresponding blocks in the internal 
caches whenever an E-cache block is invalidated. 


Each cache block contains four sub-blocks. The sub-block size is 32 bytes in 
MBus configurations and 64 bytes in XBus configurations. Block size is, there- 
fore, 128 bytes in MBus systems and 256 bytes in XBus configurations. The 
block size remains the same across the various XBus cache sizes. 


In MBus configurations, non-cacheable accesses by the processor generate 
Level-1 MBus requests. In XBus configurations, non-cacheable accesses are 
processed similarly to a read miss or a shared write, but the E-cache data is 
not updated, and no sub-block is ever replaced. 


The E-cache can be accessed directly from SuperSPARC in the MXCC’s con- 
trol space for the purposes of initialization, testing, and diagnostics. 


16.4.1 E-Cache Data Access 


E-cache data memory can be accessed directly in the MXCC control space for 
testing, initialization, and diagnostics. The E-cache data can be accessed with 
abyte, half-word, word, or double-word access. The format of an address with- 
in control space depends on MBSEL and on the cache size (CS) and half 
cache (HC) bits in the MXCC control register (CCCR). The CCCR.CS and 
CCCR.HC bits are ignored when MXCC is used on the MBus. The field widths 
are shown in Table 16-6; the field positions are shown in Table 16-7. 
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Table 16—6.E-Cache Data Address Field Widths (in bits) 











[Sus | вљвож | вы [rs | vs | vs | 
| BLK | Block | пол | nea | nsa | 202] | 
res | Reserved _ [12229] [2219] | 22201 | 2221 | 
The address of а byte in E-cache is shown in Figure 16-5. The field widths 


from Table 16-6 and field positions from Table 16-7 are needed to complete 
the address description for a particular E-cache configuration. 





Figure 16-5. Addressing E-Cache Data 


| «SPACE ——  jO|res| — _ BK _______| SUB | DBL | BYTE | 


35 24 23 
see Table 16-6 and Table 16-7 
CSPACE Control Space Base Address. Base address of the 
MXCC control space as shown in Table 16-3. 
res Reserved. These bits are ignored and should be 


zero. The width and position of this field vary with E- 
cache configuration and are shown in Table 16-6 
and Table 16-7, respectively. 


BLK Block Number. This selects the block of the E-cache 
data to access. The width and position for this field 
vary with E-cache configuration and are shown in 
Table 16-6 and Table 16-7, respectively. 
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SUB Sub-block Number. This selects a sub-block of the E- 
cache data to access. The width and position of this 
field vary with the E-cache configuration and are 
shown in Table 16-6 and Table 16-7, respectively. 


DBL Double Word. This selects a double word in an E- 
cache sub-blockto access. The width and position of 
this field vary with the E-cache configuration and are 
shown in Table 16-6 and Table 16-7, respectively. 


BYTE Byte. This selects a byte of an E-cache sub-block to 
access. 


16.4.2 External Cache Tags 


The external cache tags can be accessed in the control space for the purposes 
of testing, initialization, and diagnostics. The address format is similar to the 
E-cache data access address format. Only the block number field is used for 
E-cache tag access. 


The size and position of several fields in the address vary according to the E- 
cache configuration. The size and position of these fields is shown in 
Table 16-8 and Table 16-9, respectively. 


Table 16—8.E-Cache Tag Address Field Widths (in bits) 





LETT [as 

Cms [sues we | зв 
[төзе | Revered | 7 | @ | 5 | 5 | 
ак pue [з | 5 || 5] 
АСЗ пошта | [3 1:2 [1 


Table 16–9. Е-Сасће Tag Address Field Positions 


| MBus | Хз | 
Description 

о а 25 ан 
[reserved | Reserved | 1в | rar | па | па | 
| BLK | Block | 1197] | nse | [19:8] | [202] | 
Lr. ћевемеа zo [тл [29 | [21 | 


The format of an E-cache tag address is shown in Figure 16-6. 
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Figure 16-6. Addressing E-cache Tags 


— см — [40] BK 
seo Table 16-8 and Table 16-9 
CSPACE Contro! Space Base Address. Base address of the 


MXCC contro! space is shown in Table 16-3. 


r, reserved Reserved. These fields are ignored and should be 
zero. The widths and positions of these fields vary 
with the E-cache configuration and are shown in 
Table 16-8 and Table 16-9, respectively. 


BLK Block Number. This field selects the tag of one 
E-Cache block to access. The width and position of 
this field vary according to the E-cache configuration 
and are shown in Table 16-8 and Table 16-9, re- 
spectively. 


The format of an E-cache tag entry is shown in Figure 16-7. E-cache tag en- 
tries are of DW length. Accesses to E-cache tags are assumed to be DW 
length; the VBus size bits (SIZE[1:0]) are ignored, and DW data is always sup- 
plied. 


The field sizes and positions of the rest and AddT fields depend on the E- 
cache size selected. The sizes of these fields are shown in Table 16-10, and 
their positions are shown in Table 16-11. 


Table 16—10. E-Cache Tag Field Widths (in bits) 
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Figure 16-7. External Cache Tag Format 


35 16,15 1413 121110 9 8 7 6 5 43 2 1 0 
<< о М 

res Reserved. Read as 0 and ignored on write. 
AddT Address Tag. This is the high-order part of the ad- 


dress that matches this E-cache block. The middle 
part of the address selects a block of the cache ac- 
cording to the direct mapped scheme and is implied 
in the address comparisons. The lower-order portion 
ofthe addressis used to selectthe sub-block, double 
word, and byte to access. E-cache lookups compare 
the corresponding field of the physical address with 
the AddT field of the E-cache tags for the E-cache 
block selected by the physical address. If the AddT 
field matches and the addressed sub-block is valid, 
the access is a cache hit. 


The size and position of this field vary depending on 
the E-cache configuration and are shown in 
Table 16-10 and Table 16-11. 


rest Reserved. Read as 0 and ignored on write. The size 
and position of this field vary depending on the E- 
cache configuration and are shown in Table 16-10 
and Table 16-11. 


S Shared. When set, the corresponding sub-block is 
shared. When clear, the sub-block is held exclusive- 
ly, if valid. There are four S bits, one for each sub- 
block. 


о Owner. When set, this processor has a dirty copy of 
the data in this sub-block. The data will be supplied 
by this cache to the bus if another processor requests 
it from memory. If this block is replaced, this 
sub-block must be copied back to memory. If clear, 
main memory or another cache is the owner of this 
data. There are four О bits, one for each sub-block. 
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V Valid. When set, this sub-block contains valid data 
and can be used. When clear, this sub-block does not 
contain valid data. Attempts to access an invalid sub- 
block will cause a cache miss. There are four V bits, 
one for each of the four sub-blocks in a block. 


P Pending. When set, a VBus operation is pending on 
this sub-block. This bit is set automatically when 
MXCC has issued an operation on the system bus for 
this sub-block and is cleared automatically when the 
operation completes. There are four P bits, one for 
each of the four sub-blocks. 
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16.5 MXCC Internal Registers 


The MXCC contains a number of registers that control the operation of 
E-cache, bus, and other functions or that sense the status of MXCC functions. 
These registers are accessible in the MXCC’s control space (see Table 16-3). 


The address format for accessing MXCC registers is shown in Figure 16-8. 


Figure 16-8. Addressing MXCC Registers 


Г Е а в [ SE [г | ‘DBL | зоо | 
35 24 23 22 21 12 11 87 65 32 0 
CSPACE Contro! Space Base Address. Base address of the 
MXCC’s contro! space as shown in Table 16-3. 
reserved, r Reserved. This field is ignored and should be zero. 
SEL Select Field. This field selects one of the MXCC's in- 
ternal registers. Table 16-12 shows the selectors for 
the various МХСС registers. 
DBL Double Word. This field is ignored except when 


accessing the stream data register (SEL = 0x0). 
zero This field must be zero. 


The SEL field is used to select one of the MXCC intemal registers. The corre- 
spondence of SEL values to registers is shown in Table 16-12. This table also 
shows an abbreviated register name, the width of the register, and the section 
in this chapter where the register is described in detail. 
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Table 16–12. MXCC Register Selectors 
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Accesses to registers are assumed to be double-word accesses. On an ac- 
cess from VBus, the size bits, SIZE[1:0], are ignored. During read accesses, 
the contents of a 32-bit register are driven onto DATA[31:0], and the contents 
of a 16-bit register are driven onto DATA[15:0]. The 64-bit registers drive all 
data bits, DATA[63:0]. The unused bits are driven low on read and ignored on 
write. An atomic load-store operation to any of the registers is not supported. 
A TO bus error is reported when the processor issues an atomic load-store to 
any MXCC register. 


Similarly, system bus access assumes 64-bit access. On reads, registers 
whose widths are less than 64 bits retum unpredictable data, except in fields 
defined for that register. On writes, the unused bits are ignored. 


References to reserved register selectors (0x9 or OxD) are ignored from the 
system bus and reported as a TO bus error when issued from the processor. 
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16.5.1 Reference/Miss Count Register 


The reference/miss count register can be used to calculate the hit ratio of the 
E-cache. The register has two fields. Both the CMC and CRC are 32-bit count- 
ers, and both may be read and written; however, the two fields must be ac- 
cessed together as a single DW. 


Figure 16-9. MXCC Reference / Miss Count Register 


63 У. D 


CMC Cache Miss Counter. This field counts E-cache mis- 
ses. 
CRC Cache Reference Counter. This field counts E-cache 


references, both hits and misses. 


When the high-order bit of the CRC becomes 1, both CMC and CRC are frozen 
until the bit is cleared by software. CMC will wrap around to zero if it reaches 
its maximum count. CRC will freeze at its maximum count and therefore can- 
not wrap around. 


Every access is counted in CRC and those that miss the E-cache are also 
counted in CMC. in normal usage CMC never exceeds CRC, since software 
normally resets both fields to zero. 


When the RC field in the CCCR is set, only read accesses will be counted in 
CRC and only read misses will be counted in CMC. When CCCR.RC is clear, 
both read and write accesses are counted in CRC and misses in CMC. 


16.5.2 MXCC Built-In Self-Test (BIST) Register 


The built-in self-test (BIST) register is a 32-bit register. When the BIST register 
is written, a BIST is initiated. When read, it returns the signature of the last 
BIST operation performed. If no BIST operation has been performed, the value 
is indeterminate. The signature in the BIST register after running BIST should 
be compared to the known good signature for this revision of the device. Good 
signatures for current and previous revisions of the MXCC may be found in the 
data sheet. The value stored in the BIST register cannot be altered by writes, 
since writing it initiates a BIST operation. 

The format for reading of the BIST register is shown in Figure 16-10. Data is 
ignored on writes to the BIST register. 


Figure 16-10. MXCC BIST Register Format 


31 30 0 


16-23 


Subject to Change Without Notice 


MCC Internal Registers 





sig Signature. This field contains the signature from the 
last BIST operation performed. If no BIST operation 
has been performed, the value is indeterminate. 


This register is writable from VBus only. When the SSP writes to the BIST reg- 
ister, the MXCC immediately deasserts НОРТ and WGRT. All pending opera- 
tions are allowed to complete, after which an intemal BIST operation begins. 
During the BIST operation, all bi-directional outputs are high-impedance, and 
all uni-directional outputs are deasserted. During the BIST operation, the 
MXCC does not respond to any inputs from the system bus (MBus or XBus). 
At the end of the BIST operation, the MXCC resets itself and asserts both 
RGRT and WGRT. 


16.5.3 MXCC Control Register (CCCR) 


The CCCR contains many of the flags that control the operation of the MXCC. 
This is a 32-bit read/write register. All flags are cleared at system reset 
(RSTIN). The fields of the control register are shown in Figure 16-11. 


Figure 16-11.МХСС Control Register (CCCR) 
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The fields of the MXCC Control Register are: 
reserved Read as zero and ignored on write. 


RC Read reference count. When this bit is set, only read 
references are counted in the Reference/Miss Count 
Register. When clear, both read and write accesses 
are counted. 
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Bus watcher count. This specifies the number of bus 
watchers connected to the XBus (see Table 16-13). 
Notice that the MXCC supports 1, 2, or 4 BWs, but not 
3. This field is ignored in the MBus configuration. In 
the XBus configuration, this value must be set by 
software before any system bus command is issued. 


Table 16-13. MXCC Encoding of BWC Field 


| BWC | Number of BWs 





Write invalidate. When this bit is set, a write to shared 
data (except for atomic load-store operations) invali- 
dates any copies in the other сасће5. If there is a 
pending operation on the sub-block (as indicated by 
the sub-block's P bit), invalidation in other caches is 
not performed. This bit is used only in the XBus con- 
figuration and is ignored in the MBus configuration. 


Prefetch enable. When this bit is set, a prefetch is 
triggered on every burst read access (e.g., fetch of a 
cache blockinto the SSP's internal instruction or data 
cache) whose next sequential sub-block is not in E- 
cache, although prefetch will not cross a block 
boundary. Only one outstanding prefetch is allowed 
at any one time. Individual prefetches may occasion- 
ally be skipped if there is temporary congestion of re- 
sources within the MXCC. 


Multiple command enable. When this bit is cleared, 
the MXCC will not accept a new operation from the 
SSP until it completes any previously initiated opera- 
tion. When this bit is set, the MXCC may issue multi- 
ple commands to the bus or into the output FIFO with- 
out waiting for the completion of previous com- 
mands. 
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PE Parity enable. When set, even parity is generated 
and checked for each byte on VBus. When the bit is 
cleared, parity checking is disabled, and odd parity is 
generated. By generating odd parity when parity 
checking is disabled, parity errors can be injected 
into the E-cache for the purpose of verifying parity- 
detection circuits on VBus. 


CE E-cache enable. E-cache is enabled when this bit is 
set and disabled when it is clear. When the E-cache 
is disabled, normal cacheable accesses by the pro- 
cessor do not read or write the E-cache data, but 
diagnostic access can still be performed via control 
space. 


cs E-cache Size. When this bit is set, the MXCC sup- 
ports a 2-Mbyte E-cache. When this bit is clear, 
1-Mbyte E-cache is supported. This bit is ignored in 
MBus configurations. The effective size of E-cache is 
determined by CS and HC, as shown in Table 16-5. 


HC Half cache. When this bit is set, only the lower half of 
the E-cache is used. When this bit is cleared, all of the 
E-cache is used. This bit is ignored in MBus configu- 
rations. The effective size of E-cache is determined 
by CS and HC, as shown in Table 16-5. 


16.5.4 MXCC Status Register (CCSR) 


The MXCC status register (CCSR) is a 64-bit register that shows the internal 
state of the MXCC. The status register is readable and writable and is accessi- 
ble through the JTAG port. The register is writable for diagnostic purposes. 
Writing this register during normal operation will cause unpredictable results. 
It is cleared at system reset (RSTIN). The format of CCSR is shown in 
Figure 16-12. 


Figure 16-12. MXCC Status Register (CCSR) 


fum эы m P] me к 


40 39 38 37 12 11 


The fields of the MXCC Status Register are: 
reserved Reserved. Read as zero and ignored on write. 
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NCSID 


NCSPA 


NCSPC 


SPC 
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Store exception pending. This bit is set when a store 
operation from the SSP is to be retried, but the bus 
transaction it triggered generates an exception. The 
exception is not passed to the processor until the 
store is retried. When the store is retried, the MXCC 
signals the error code to the SSP in the usual way 
with МЕХО, WADY, and RETRY and clears the SXP 
bit. Astore operation is retried on a store miss and on 
a shared write in the MBus configuration. 


Synchronous mode. This bit shows the status of the 
SYNC pin and is read-only. This bit reads as a 1 if 
SYNC is asserted (low). 


Non-cacheable store bus watcher ID. This field indi- 
cates the BW ID of the BW with any pending 
non-cacheable stores. It is used only in the XBus 
configuration. Multiple non-cacheable stores may be 
pending only within the same page and via the same 
BW. There is no such restriction in the MBus configu- 
ration, where the MXCC can accept up to two 
non-cacheable stores into the request queue. 


Non-cacheable store page address. This field con- 
tains the page address of any pending non-cache- 
able stores. Multiple non-cacheable stores may be 
pending only within the same page and BW. There is 
no such restriction in the MBus configuration, where 
the MXCC can accept up to two non-cacheable 
stores into the request queue. 


Non-cacheable store pending count. This four-bit 
counter keeps track of the number of pending 
non-cacheable store operations. The PEND pin will 
be asserted if NCSPC is not zero. The SSP uses the 
PEND signal for memory model support; see Section 
8.7 for more details. 


Store pending count. This four-bit counter keeps 
track of the number of pending cacheable store oper- 
ations. The PEND pin will be asserted if SPC is not 
zero. The SSP uses the PEND signal for memory 
mode! support; see Section 8.7 for more details. 
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BC Boot communication. This bit is readable and writ- 
able in the MXCC status register and from JTAG 
scan. The bit can be used to communicate between 
software running on the SSP and a JTAG scan con- 
troller system. 


WP Write miss pending. Set when any write miss has 
been issued to the bus interface and has not yet com- 
pleted. 


RP Read pending. Set when any read (except for a pre- 
fetch read) has been issued to the bus interface and 
has not yet completed. 


PP Prefetch pending. Set when any prefetch read has 
been issued to the bus interface but has not yet com- 
pleted. 


16.5.5 MXCC Reset Register 


The reset register is a 32-bit register that shows the type of reset that last oc- 
curred; see Chapter 13 for more details. The reset register is readable, writ- 
able, and JTAG-scannable. The format of the reset register is shown in 
Figure 16-13. 


Figure 16—13. Reset Register 


__________ S LLL | S! гез | 
31 3 2 1 0 
The fields of the Reset Register are: 
res Reserved. Read as zero and ignored on write. 
WD Watchdog (WD) reset. When ERROR is asserted by 


the SSP, the МХСС sets the WD bit. Writing a 1 to this 
bit clears it. in the MBus configuration, ССЕНА will be 
asserted when WD is 1. 


SI Software Intemal Reset. When this bitis written as 1, 
a software internal reset is generated. On software 
internal reset, the MXCC will issue a reset to the SSP 
and clear the WD bit. This bit is not cleared by the 
software intemal reset and remains set after the soft- 
ware intemal reset. 
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16.5.6 MXCC MBus Port Register 
The MBus port register is a 32-bit register used in an MBus system to identify 


the processor number and to define the base address of the configuration 
space of each processor. It is read-only. 


Figure 16-14. MXCC MBus Port Register 


тега | mid | reserved] — тю — — T тет [ пип 


31 ?B 27 ?4 23 16 15 8 7 43 0 


The fields of the MBus Port Register are: 


reserved Reserved. Read as zero and ignored on write. 

mid Module ID from the MID[3:0] pins. 

mdev MBus device number. This eight-bit field is 
hard-wired to 0x01. 

mrev Device revision number. This field contains a con- 


stant for each device revision. See the МХСС data 
sheet for revision numbers. 


mvend MBus Vendor number. This field contains the con- 
stant 0x4. 
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16.6 MXCC Block Copy Facility 
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The MXCC provides a facility to assist in the copying or zeroing of main 
memory. Transfers are performed using the system bus's blocktransfer opera- 
tion. The transfers occur between memory and the stream data buffer. The 
stream unit is situated in the bus command logic, which is part of the cache 
controller core (see Figure 16-3). 


Stream source and data addresses are physical addresses and may be 
cacheable or non-cacheable. Cacheable accesses snoop the local caches 
and perform cacheable bus accesses that should snoop other caches in the 
system. 


To zero a block of memory, the stream data buffer is loaded with zeros. Then 
the destination address is stored into the stream destination register, which 
starts a block write of a sub-block from the stream data buffer to the memory 
address. A sub-block is 32 bytes in the MBus configuration and 64 bytes in the 
XBus configuration. if more than one sub-block of memory needs to be zeroed, 
additional destination addresses are stored into the stream destination regis- 
ter. All stream accesses are sub-block aligned. 


To copy a block of memory, a sub-block address is written into the stream 
source register, and then a sub-block address is written into the stream desti- 
nation register. This copies a sub-block from the source address to the desti- 
nation address via the stream data buffer. Additional sub-blocks are copied in 
the same way. The stream hardware is interlocked to prevent a write to a 
stream register from completing while the register is still busy with a previously 
started operation. Therefore, a copy loop can write the stream source and des- 
tination register without checking whether the previous sub-block operation 
has completed. The block copy will proceed at the speed that sub-block can 
be transferred. When the stream copy source is not block-aligned, the lower 
bits are ignored, and the transfer completes on sub-block boundaries. 


The block read facility may also prove useful in accomplishing a hardware-as- 
sisted software memory-scrubbing scheme. 


The stream source and destination registers are interlocked only for accesses 
fromthe processor; writing into the source or destination register from the sys- 
tem bus before the previous operation is complete will have unpredictable re- 
sults. Bus accesses to the stream data buffer should be attempted only while 
there is no processor-initiated stream operation in progress. 
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16.6.1 Stream Data Buffer 


The stream data buffer is accessible through the MXCC’s control space. This 
buffer is 64 bytes long and is read or written as double-words. The particular 
double-word of the buffer is selected by the DBL field of the contro! space ad- 
dress (see Figure 16-8). All stream accesses to the data buffer (see Subsec- 
tions 16.6.2 and 16.6.3) transfer a sub-block of data. A sub-block is 64 bytes 
in XBus configurations and 32 bytes in MBus configurations. Only the first 32 
bytes of the data buffer are used in MBus configurations. 


16.6.2 Stream Source Register 


The stream source register is a read/write register that controls stream read 
operations and reports their status. When an address is written into the regis- 
ter, ittriggers a read of a sub-block from that physical address into the stream 
data buffer. А sub-block is 32 bytes on MBus and 64 bytes on XBus. The iow- 
order address bits are ignored, and the transfers are always sub-block aligned. 


The format of the stream source and destination registers is shown in 
Figure 16-15. 


Figure 16-15. MXCC Stream Source / Destination Register 


63 62 37 36 35 0 
f Ready. This bit indicates that the previous stream op- 
eration is completed. This bit is ignored when written. 
resi Reserved. Ignored on write and read as zero. 
C Cacheable. The C bit specifies whether the operation 


is to (or from) cacheable space. Cache-coherence 
checks are applied only to cacheable accesses. The 
access is to non-cacheable space when the C bit is 
clear. 


PA Physical Address. Physical address bits 35 to 6 for 
XBus or bits 35 to 5 for MBus of the source or destina- 
tion. All transfers started from the Stream Source 
Register or Stream Destination Register are sub- 
block aligned in memory and in the stream data buff- 
er and, except when errors are encountered, transfer 
an entire sub-block. 
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res Reserved. Ignored for address generation. The bits 
written may be read. The size of this field varies with 
the bus configuration and is six bits wide in XBus con- 
figurations and five bits wide in MBus configurations. 


16.6.3 Stream Destination Register 


The stream destination register is a read/write register that controls stream 
write operations and reports their status. When an address is written into the 
register, it triggers the write of a sub-block from the stream data buffer into that 
physical address. A sub-blockis 32 bytes on MBus and 64 bytes on XBus. The 
low order address bits are ignored and transfers are always sub-block aligned. 


The format of the stream destination registers, shown in Figure 16-15, is the 
same as that of the stream source register. 
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16.7 MXCC Interrupts 


The MXCC supports interrupts differently on MBus than on XBus. In MBus 
configurations, system interrupts are sent to the MXCC on MIRL{3:0] pins and 
passed unmodified to the SSP on IRL[3:0]. The MXCO does not generate any 
interrupts. 


In an XBus system, interrupts from the system come via the XBus as interrupt 
request packets (see Section 19.7) or from the BootBus through hardware 
interrupt-polling cycles (see Section 20.4). The interrupt level indicated in the 
command is decoded, and the corresponding bit in the Interrupt Pending Reg- 
ister is set. The MXCC generates level-15 interrupts to report asynchronous 
errors. All the pending interrupts that are not masked by the interrupt mask reg- 
ister are prioritized, and the highest pending interrupt level is sent to the SSP 
on IRL[3:0]. 


In XBus configurations, the SSP can generate interrupts to be sent by XBus 
to the system. These interrupts are generated by writing to the interrupt gen- 
eration register in the MXCC. 


If the BootBus is not used in the XBus configuration, LDATA[3:0] should be 
pulled to ground to avoid generating unexpected interrupts. 


16.7.1 Interrupt Pending Register (XBus) 
Figure 16-16. XBus Interrupt Pending Register 


15 1 0 
Intivi Interrupt Level. This field indicates pending interrupts 


on each of the 15 interrupt levels. Bit n is set if an in- 
terrupt is pending at level n. This field is read-only. 


There is nolevelO interrupt, and thus never an interrupt pending at level 0. This 
register is read-only and is cleared through the interrupt pending clear register 
and by system reset. The values retumed from reading the interrupt pending 
register are not predictable in the MBus configuration. 


16.7.2 Interrupt Mask Register (XBus) 
Figure 16-17. XBus interrupt Mask Register 


15 0 
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INTMASK Interrupt Mask. This field is used to selectively mask 
pending interrupts. If an interrupt is pending at level 
n and bit n of INTMASK is clear, the interrupt is priori- 
tized and passed to the SSP. If bit n is set, pending 
interrupts at level n are not passed on to the proces- 
sor. 


This register is readable and writable. it is set to ts by system reset (ASTIN). 
The interrupt mask register has no effect on MBus configurations. 


16.7.3 Interrupt Pending Clear Register (XBus) 


Figure 16—18. XBus Interrupt Pending Clear Register 


INTCLR 


15 0 


INTCLR Interrupt Clear. This field is used to selectively clear 
pending interrupts. Writing a 1 into bit n of this register 
clears the pending interrupt at level n. 


This register is write-only. A TO bus error is reported to the SSP for a read ac- 
cess to this register. 


Writing the interrupt pending clear register has no effect in MBus configura- 
tions. 


16.7.4 Interrupt Generation Register (XBus) 


The interrupt generation register is a 32-bit write-only location used to gener- 
ate system interrupts. It is accessible only from VBus. A write to the interrupt- 
generation register generates a two-cycle interrupt packet to the XBus. The 
first cycle contains the interrupt command code. The content of the interrupt 
generation register is in bits [31:0] of the second cycle. The entire 32 bits is 
passed to the BW for system-specific handling, but only the low-order 15 bits 
are recognized by the MXCC when received from the BW in an interrupt mes- 
sage. A TO bus error is reported to the SSP on a read access to this register. 


Figure 16-19. MXCC XBus interrupt Generation Register 


31 15 14 о 
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The fields of the interrupt generation register are: 


SYS System-Specific. Information passed to the bus 
watcher for system-specific use. It is not interpreted 
by either a sending or receiving MXCC or processor. 


INT Interrupt Bits. Bits corresponding to the interrupt lev- 
els. Several bits may be set. If bit nis set, an interrupt 
is to be communicated at level n+1. Thus bit 0 corre- 
sponds to level 1, and bit 14 corresponds to level 15. 


Writing to the Interrupt Generation Register has no effect in MBus configura- 
tions. 
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16.8 MXCC Error Handling 


16.8.1 Bus Errors 
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The MXCC may encounter errors in operation. These can come from the bus, 
from parity checking on E-cache or bus data, from intemal operations, or from 
illegal access. 


Errors are handled in four different ways in MXCC: 


С] Ermors logged to МХСС'5 error register and reported to SuperSPARC 
through encoding of control strobes MEXC, RADY (or WADY), and 
RETRY (see Table 18-1) are: 
| Errors on a read or a LOST operation, 

Ш Store exception pending condition of a write miss, 

Ш Data parity errors on VBus when the SSP is the master, and 

Ш Errors on a demap initiated by the SSP. 

О Errors are reported to SuperSPARC through а level-15 interrupt (in XBus 
configurations only). Errors reported in this way are: 

M Asynchronous errors, which include errors of operations that have 
been acknowledged by MXCC to SuperSPARC. These include, for 
example, stream operations for Block Copy/Zero, shared writes in the 
XBus configuration, or non-cacheable writes in which errors occur lat- 
er in the operation. 

Ш Data parity errors on the VBus when the MXCC accesses extemal 
cache for an incoming bus request. 


All these errors are logged into the error register of MXCC. For the MBus 


configuration, these types of errors are reportedto the system by asserting 
АЕНН. 


[] Errors are reported to the system by asserting CCEHR. Errors reported 
in this way are: 


Ш XBus parity errors, 
Ш cache consistency errors, and 
ШЕ VBus parity errors on a flush operation. 


These errors are considered catastrophic. They are logged into the error 
register of the MXCC before CCERH is asserted. 


С] Errors neither reported nor logged. For example, errors on the MXCC pre- 
fetch operation are ignored. 


Bus errors are signalled when an access cannot be satisfied for hardware rea- 
sons. This categorization excludes such MMU-detected conditions as protec- 
tion violations. 
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The МХСС has four bus error codes. The MXCC preserves the four codes in 
signalling and logging the errors but does not differentiate the codes in any oth- 
er way. Thus, meanings that a system actually assigns to the codes may differ 


from those suggested here. 

The codes are: 

TO Timeout. This code is retumed when the addressed 
location does not retum any answer after some fixed 
amount of time. 

BE Bus Error. This code is returned when the addressed 
location rejects the required action because it is ille- 
gal. 

UC Uncorrectable Error. This code is returned when the 


addressed location rejects the required action be- 
cause of an internal failure. 


UD Undefined (other) Error. This code is гештед for er- 
rors that cannot be classified under the other codes. 


Bus Errors on Instruction Fetches 


An instruction fetch accesses an instruction that will be executed. If a bus error 
is encountered on an instruction fetch, an instruction access, exception trap 
is taken by the SSP when the instruction reaches the execution stage. Errors 
onthe system bus are passed through MXCC to the SSP and logged in CCER. 


The SSP's Fault Status Register (see Subsection 9.12.3) contains the bus er- 
ror code in the UD, UC, TO, and BE bits, with FT=5, AT=2 or 3, and FAV=0. 
The virtual address that encountered the error is the trap PC (see Chapter 12). 
The fault address register (see Subsection 9.12.4) is not updated. 


Bus Errors on Instruction Prefetches 


An instruction prefetch accesses an instruction that may or may not be 
executed in the future. If a bus error is encountered on an instruction prefetch, 
the prefetch operation is aborted, and the error is ignored. 


Bus Errors on Data Loads 


A data load is the access made by a load instruction. If a bus error is encoun- 
tered оп a data load, adata access exception trapis taken by the SSP. Errors 
on the system bus are passed to the SSP by the MXCC and logged into the 
CCER. 
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The SSP's Fault Status Register (MFSR) (see Subsection 9.12.3) contains the 
bus error code in the UD, UC, TO, and BE bits with FT=5, AT=0-3, and FAV=0. 
If the erroring access is through an ASI other than 0x8-0xB and 0x20-0x2F, 
MFSR.CS is set to 1. The virtual address that encountered the error is saved 
in the SSP's MFSR (see Subsection 9.12.4). 


Bus Errors on Synchronous Data Stores 


A synchronous data store is an access made by a store instruction that cannot 
be buffered in the SSP's store buffer. The LDSTUB, LDSTUBA, SWAP, and 
SWAPA instructions always perform synchronous stores. In addition, STA (or 
STDA) instructions to ASIs other than 0x08-0x0B and 0х20-0х2Ғ are also 
synchronous. See Section 10.6 for more information on synchronous stores. 
The MXCC passes errors on the system bus to the SSP and logs the error on 
CCER. 


The SSP's MFSR (see Subsection 9.12.3) contains the bus error code in the 
UD, UC, TO, and BE bits, with FT«5, AT=4-7, and FAV=1. If the erroring ac- 
cess is through an ASI other than 0x8-0xB and 0x20-0x2F, MFSR.CS is also 
set. The virtual address that encountered the error is saved in the SSP's fault 
address register (MFAR) (see Subsection 9.12.4). 


Bus Errors on Asynchronous Data Stores 
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An asynchronous data store is any store buffered in the SSP's store buffer. 
(See Chapter 8 and Section 10.6.) The handling of bus errors on asynchro- 
nous data stores depends on the state of the SSP's store buffer and on wheth- 
er the error is early or late. 


An early error is an error that is signalled to the SSP before the bus operation 
is effectively started by the MXCC. There are two cases of early store errors: 


Г] Memory store that misses іп the external cache and for which the cache 
receives a bus error indication from memory when it tries to load the mis- 
sing cache block. 

Г] VBus parity error. 

If the SSP's store buffer is disabled and an early error is notified, the error is 

handled as a synchronous data store error. 


Ifthe SSP's store buffer is enabled and an early error is notified, the processor 
takes a data store error trap. The MFSR.SB bit is set but the bus error code 
is not logged in MFSR, and MFAR is not updated. 


A late error is an error that is signalled to the SSP after the bus operation has 
been started by the MXCC and acknowledged to the SSP. Late errors occur 
only on non-cacheable stores. 
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If a late error is signalled in an XBus configuration, a level-15 interrupt is sig- 
nalled by the MXCC to its directly connected SSP. The CCER contains the er- 
ror information with the AE bit set to 1, and the DCmd subfield of CCER.CCOP 
is set from the operation. In MBus configurations, the АЕНН signal is asserted 
instead of a level-15 interrupt being signalled. 


The value of CCCR.MC (multiple command enable) does not affect error han- 
dling of store operations. 


Bus Errors on Block Copy Operations 


Block copy operations are initiated by using the MXCC's stream registers (see 
Section 16.6). If a bus error is signalled during a block copy operation, a lev- 
el-15 interrupt is signalled by the MXCC to its directly connected SSP. The 
МХСС error register (CCER) is updated with the error information, and 
CCER.AE is set to 1 to indicate an asynchronous error. 


Block copy errors may be differentiated from late asynchronous errors by the 
value logged in the ОСта subfield of CCER.CCOP. Table 16-14 shows 
CCER.CCOP.DCmd values, the operation in progress, and the XBus packet 
name that indicates the error if using XBus. The DCmd field is set to the same 
values when in an MBus configuration. 


Table 16-14. ОСта Field of Block Copy and Asynchronous Errors 


обе Operation [ Ми — 
[ms | воем ______ Soar Read Reni 
Го [Боха _______ Steam Road поду | 
| ове | вожоме | ream wie Ropy | 
[ons | Asyncroneus Sere Enar [ТО wae Неру ___ 















Bus Errors on MMU TLB Operations 


An MMU TLB operation is a memory reference initiated by the SSP’s memory 
management unit to load entries into its TLB or to set the modified or refer- 
enced bits of a PTE (see Chapter 9). if a bus error is signalled during an MMU 
TLB operation, the SSP takes either an instruction_access_exception trap or 
adata_access_exception trap, depending on whether the operation was initi- 
ated using ASIs 0x08-0x09 (instruction fetch ASIs) or another ASI. Errors on 
the system bus are passed through the MXCC to the SSP and are logged in 
the CCER. 
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If an instruction access exception trap is taken, the SSP's MFSR is updated 
with the bus error code in the UD, UC, TO and BE bits, FT=4, and FAV=0. The 
virtual address of the error is the trap PC, which is saved as normal for the trap. 
The MFAR is not updated. 


Ifadata access, exception trap is taken, the SSP's MFSR is updated with the 
bus error code in the UD, UC, TO, and BE bits, FT=4 and FAV=1. If the access 
was through an ASI other than 0x08-0x0B or 0х20-0х2Е, MFSR.CS is set to 
1. The virtual address of erroring operation is saved in the MFAR. 


Illegal access is an access to an intemal MXCC register or memory with an 
invalid address or operation. Examples of illegal accesses from VBus include: 


Г) Atomic load-store to BootBus or the MXCC registers. 

[] Out-of-range control space access. 

О Read of interrupt generation register. 

Invalid access from either the SSP or VBus can he reported as: 
(а ТО error to SuperSPARC. 


О ATO error if the system bus has timeout logic. (illegal accesses from the 
system bus side are ignored.) 


Bus Errors on Outgoing Transactions 
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In the MBus configuration, an error on an outgoing request is reported back 
to the MXCC with the MBus acknowledgment type by encoding the type of the 
error into MRDY, МЕНА, and МАТУ, as shown in Table 17-3. 


In the XBus configuration, an error on an outgoing request is reported to the 
MXCC in two different ways (see Subsection 19.5.6): 


О Theermorbitin the header cycle of the reply packetis set. The second cycle 
carries the error code in the three least significant bits. 


(а Odd рату is used on а data cycle to indicate a memory fault data cycle. 
in this case, the three least significant bits of the memory fault data cycle 
contain the error code. 


These errors are signalled to the local SSP (which initiated this request), as 
described above according to the type of access. 
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Parity Errors on incoming Requests 


In MBus and XBus configurations, if a parity error occurs on the VBus when 
the MXCC accesses external cache in response to an incoming bus request, 
a VBus parity error will be reported to the requestor as an uncorrectable error. 


Parity Error on VBus when the Processor is Master 


A parity error on VBus when SuperSPARC is the bus master is reported to Su- 
perSPARC as an undefined error. 


16.8.2 MXCC Level-15 Interrupt and CCERH/AERR 


In XBus configurations, the MXCC signals a level-15 interrupt to the locally 
connected processor when an asynchronous error is detected. The asynchro- 
nous error may have been on an operation initiated by the local SSP (for exam- 
ple, a late error in an asynchronous data store), or on an operation initiated 
from another processor or an ИО device, but which was detected by the MXCC 
(for example, E-cache parity error on data accessed because the E-cache is 
the owner). 


When a local level-15 interrupt is issued, the error information is logged in the 
CCER. 


In MBus configurations, these same errors cause АЕНЯ signal to be asserted 
to signal the error to the system. AERR in MBus configurations is the same pin 
as CCERR in XBus configurations. 


Some errors are always reported to the system by asserting CCERR or AERR. 
Errors reported in this way are: XBus parity errors, cache-consistency errors, 
and VBus parity errors on a flush operation. These errors are considered cata- 
Strophic. They are also logged into the error register of the MXCC. 


16.8.3 Errors Detected by MXCC 


The MXCC detects parity errors on both XBus and VBus. It can also detect 
cache-consistency errors in XBus configurations. 


External Cache Parity Error 


A parity error in the external cache can be discovered in three cases, each of 
which is handled differently. 


С If another processor reads data from the E-cache because the Е-сасће 
is the owner, a parity error is signalled to the bus as an uncorrectable (UC) 
error. In XBus configurations, the locally attached processor is signalled 
with a level-15 interrupt. іп MBus configurations, the error is reported by 
asserting the AERR signal. The error information is logged in CCER; CP 
is set to 1. 
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С If the MXCC detects an E-cache parity error when trying to write back a 
block to main memory due to the replacement algorithm, it proceeds with 
the write of the incorrect data to memory. In XBus configurations, it signals 
the locally attached processor with a level-15 interrupt. In MBus configura- 
tions, AERR is asserted. The error information is logged in the CCER; CP 
is set to 1. 


О Ifan SSP detects an E-cache parity error during a read from its own E- 
cache, the error condition is handled as if a bus error indication UC had 
been signalled on the read operation. The SSP's MFSR.P bit is set to 1. 


The first two cases can be distinguished by the value logged into the ОСта 
subfield of CCER.CCOP. The DCmd subfield is 0x06 for the second case (E- 
cache parity error on write back) only. 


SSP Write Parity Error 


XBus Parity Error 


If an E-cache (VBus) parity error is detected by MXCC during a write by the 
SSP to the E-cache, an early bus error indication of an undefined (UD) error 
is signalled to the processor. The VP bit in the CCER is set to 1. 


When the MXCC detects an XBus parity error, it sets the XP bit of the ССЕП 
and asserts the ССЕЯН signal to the system. See Chapter 19. 


Cache Consistency Errors 


Cache-consistency errors are detected by using the encodings of the XBus 
packet's Tag Command (TCMD) field that indicated expected tag value. A mis- 
match in tag value from the expected value triggers a cache-consistency error. 
When the MXCC detects а cache-consistency error, it sets the CC bit in the 
ССЕ and asserts the ОСЕЋА signal to the system. Proper operation of 
MXCC after a cache-consistency error is not assured; system logic should re- 
set ће MXCC аћег this error occurs. 


Cache-consistency errors cannot occur in MBus configurations. 


16.8.4 MXCC Error Register (CCER) 
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The MXCC error register is a 64-bit register that records the address and other 
relevant information from a transaction that generates an error. The error reg- 
ister is readable, writable, and JTAG-scannable. It is not cleared at system re- 
set (ASTIN). 
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The format of the MXCC Error Register is shown in Figure 16-20. Bits in this 
register are cleared by writing 1s to the relevant field. This allows the system 
to clear previously recognized errors by storing the previously read contents 
backinto the register. The CCOP, ERR, S, and PA fields are latched on the first 
occurrence of a CC, CP, or AE error. This information will not be updated on 
subsequent errors until the EV bit is cleared. 


Figure 16-20. MXCC Error Register (CCER) 
| МЕ | ХР | СС | УР | СР | АЕ [ЕУ — ccop | —— ERR — [S] 


63 62 61 60 59 58 57 56 46 38 


The fields of the error register аге: 


ME Multiple Errors. This bit indicates that multiple errors 
of the same type occurred. ME is set when an error 
occurs that would set XP, CC, VP, CP, or AE with that 
bit already set. 


XP XBus Parity Error. When a parity error is detected on 
the XBus, this bit is set to 1 if it is not previously set. 
К previously set, ME is set instead. CCEHR is also 
asserted. 


сс Cache-Consistency Error. When the MXCC detects 
an unexpected E-cache tag status, this bit is set to 1 
if not previously set. If previously set, the ME bit is set 
to 1 instead. If the EV bit is not already set when the 
CC error occurs, it is set now, and the parity bits are 
stored in ERR. The CCOP, S, and PA fields are set to 
reflect the operation code, the supervisor state, and 
the physical address of the command that caused the 
error. CCERR is also asserted. 


VP VBus Parity Error when the SSP is the bus master. 
When parity errors are detected on the VBus during 
а processor-initiated write operation, this bit is set to 
1 if not previously set. If previously set, the ME bit is 
set to 1 instead. The MXCC reports the error to Su- 
perSPARC with an undefined error (MEXC = H, 
WARDY = L, RETRY = L) acknowledgment on VBus 
(see Chapter 18). 
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CP 


CCOP 


16-44 


MCC Enor Handing as 


VBus Parity Error with the MXCC as the bus master. 
When panty errors are detected on VBus for an 
E-cache access initiated from the system bus, this bit 
is set to 1 if not previously set. If previously set, the 
ME bit is set instead. If the EV bit is not already set 
when the CP error occurs, it is set now, and the parity 
bits are stored in ERR. The CCOP, S, and PA fields 
are set to reflect the operation code, the supervisor 
State, and the physical address of the command that 
caused the error. An error reply with the error code 
set to uncorrectable error is then sent to the request- 
er. in XBus configurations, a level-15 interrupt is sent 
to the SSP. In MBus configurations, CCERR is as- 
serted. 


Asynchronous Error. When an error occurs on a write 

or stream operation for which the MXCC has pre- 

viously sent an acknowledgement to the SSP, this bit 

is set to 1 if not previously set. If previously set to 1, 

the ME bit is set instead. If the EV bit has not already 

been set when the AE error occurs, it is set now. The 

CCOP, S, and PA fields are set to reflect the error 

code. іп XBus configurations, а level-15 interrupt is ` 
sent to the SSP. In MBus configurations, CCEHR is 

asserted. 


Error Information Valid. This bit is set when the 
CCOP, ERR, S, and PA are loaded with information 
about an error. These fields are updated with in- 
formation from a new error condition only when EV is 
clear. Error handling should clear EX after reading 
the error information. 


МХСС Operation Code. The MXCC operation code 
of the command that incurred the error. In the XBus 
configuration, CCOP contains bits from the header 
cycle of a reply command. See Subsection 16.8.5. 
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ERR Error code. This field is used to log the error code of 
an XBus reply packet or MBus acknowledgement, or 
the parity bits on the VBus when the MXCC is the 
masterfor CP errors. The error codes from bus errors 
appear in ERR[2:0] and are encoded as shown in 
Table 16-15. Parity bits are logged in ERR, with a 
one-to-one correspondence with the DPAR[0:7] sig- 
nals where ERR[7] corresponds to the DPAR[O] pin. 


Table 16-15. CCER.ERR Error Codes 





reserved 


S Supervisor bit. This bit records the state of the SU 
signal on VBus for commands initiated by the SSP or 
the SU bit in MBus command words from other pro- 
cessors. It is set on CP and AE errors. This is used 
only in the MBus configuration. In XBus configura- 
tions, itshould be ignored when reading the error reg- 


ister. 
res Reserved. Read as zero and ignored on writes. 
PA Physical Address of the access causing the error. 


Errors occurring on the MBus during writebacks cause the last writeback ad- 
dress to be latched, not necessarily the address that got the error. 


16.8.5 CCOP Sub-fields 


The CCOP field of CCER has additional substructure. The subfields of CCOP 
are shown in Figure 16-21. 


Figure 16-21. Format of CCOP Field 
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DCmd Data Command. This command is derived from the 


packet header. The low-order bit of this field indicates 
that the command was a reply. If this bit is clear, the 
command was a request. The encoding of this field 


is given in Subsection 19.7.1. 
PL Packet length (0 is two-cycle, and 1 is ппе-сусје). 
Xdst Destination ID. See Chapter 19, XBus, for more de- 
tails. 


16.8.6 Interpretation of CCER After Errors 


After an error, CCER logs the cause of the error. If CCER.ME is set, the ССЕП 
contains information on more than one error. If there is a single error logged, 
interpretation is simplified. The main decoding of the error is performed by ex- 
amining the XP, CC, VP, CP, and AE bits. if ME is not set, only one of these bits 
should be set. The paragraphs below provide information on decoding the in- 
formation in CCER in each of these cases. 


itis important to read the CCER as soon as practical and clear the error indica- 
tors by storing the value read back into the CCER. The errors can be pro- 
cessed from the record in the value already read from CCER. Errors occurring 
later will be logged into CCER without interfering with the interpretation of the 
previously read error information. 


Asynchronous Errors (AE) 


When the AE bit is set, an asynchronous error has occurred. If neither the CC 
nor the CP bitis also set, ME is clear and EV is set. The CCOP, 5, and PA fields 
contain information on the operation that encountered the error. 


Table 16-16 indicates some of the most common of the possible values in the 
CCER.CCOP.DOmdü field after an asynchronous error is logged (AE=1). 


Table 16—16. DCmd Field After Asynchronous Error 
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Cache Parity Errors (CP) 


When the CP bit is set, an asynchronous error has occurred. If neither the CC 
nor the AE bit is also set, ME is clear, and EV is set, then the CCOP, S, and 
PA fields contain information on the operation that encountered the error. 


Table 16-17 indicates some of the most common of the possible values in the 
CCER.CCOP.DCmd field after a cache parity error is logged (СР=1). 


Table 16—17. DCmd Field After Asynchronous Error 


[Boma] Command [Operation Causing Error 


lani NC Get Block Rast КБш! nen (е.д., bcopy from another 


| 0x06 | Flush Block Rast | Write back by E-cache to memory 


Get Block Rast XBus GetBlock (e.g., access by another MXCC 
to an owned block) 


| 0x10 | Ю Get Single Rast | Foreign read to cache data via XBus operation 







Processor Рагћу Errors (VP) 


If the MXCC detects a VBus parity error when the SSP is writing to E-cache, 
the MXCO sets VP and gives an early bus error indication of UD to the proces- 
sor. Other error information is not logged in CCER. The processor will take a 
data access exception on the access and may retry the store. 


XBus Parity Errors (XP) 


When the MXCC detects an XBus parity error, it sets the XP bit of the CCER. 
Other error information is not recorded in CCER. For full recovery or logging, 
external logic should log the failing bus transaction. 

Cache Consistency Errors (CC) 


When the CC bit is set, an asynchronous error has occurred. If neither the CP 
nor AE bitis alsoset, ME is clear, and EV is set, then the CCOP, S, and PA fields 
contain information on the operation that encountered the error. 


The TOM that caused this error is not preserved in CCER. 


The MXCC shouid be reset after a Cache-Consistency Error is detected, since 
proper operation is not assured after a cache-consistency error. 


Subject to Change Without Notice 





16-48 MultiCache Controller (MCC) 


Subject to Change — Notice 


Chapter 17 


MBus 





АЊА AA A FAM AA ААЛ ARAM A A nann 


The SPARC MBus is a high-speed interface that connects SPARC processor 
modules to physical memory and VO modules. The MBus is not intended for 
use as a general expansion bus on a system back-plane spanning numerous 
boards. Rather it is intended to operate in a carefully controlled geographical 
area with the interconnect and associated circuitry located on only one printed 
wiring board (PWB). Processors may be located on the main PWB or on a 
small module. Modules consist of one or more integrated circuits, on one (or 
more) of which the MBus interface is contained. 


This chapter covers MBus operation in two SuperSPARC configurations: con- 
necting a SuperSPARC processor (SSP) directly to the MBus and using a 
module containing an SSP and a MultiCache Controller (MXCC) chip. In each 
configuration we consider how the processor or module behaves as a bus 
master, bus slave, and bus snooper. 
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17.1 MBus Overview 


MBusis a SPARC Intemational standard designed to offer processor-indepen- 
dent connections between one or more processors and memory. MBus 
standard connectors on a system board can accept MBus processor modules, 
offering performance upgradable systems. The MBus Specification is the 
specification of the MBus logical and electrical protocols and is available from 
SPARC Intemational. 


MBus is a high-performance bus. It is fully synchronous; all transfers are 
controlled by an MBus clock, which can be up to 40 MHz. It supports block 
transfers in sizes up to 128 bytes with a peak transfer rate of 320 MBps. 


MBus is a 64-bit bus. The 64 data lines are multiplexed, carrying a command 
word to start each transaction and data to complete it. The MBus command 
word contains the memory address of the transaction, the transaction type 
(e.g., read or write), its size (e.g., 1 byte ог 32 bytes), and other information. 


The unit that initiates a transaction on the bus is termed the “master.” The unit 
that is addressed in a transaction is termed the "slave." 


Each MBus system requires an MBus arbiter. Each unit that can be a master, 
has arequest signal to the arbiter and a grant signal from the arbiter. Units can 
use the bus when granted if the bus is free. MBus does not define the 
arbitration algorithm to be used; it is system-dependent. 


MBus is defined for uniprocessor and multiprocessor systems. The uniproces- 
sor form of MBus is termed *Level 1," and the multiprocessor "Level 2." 
SuperSPARC configurations (both with and without the MXCC) use the Level 
2 signals and protocols (except when all caches are disabled). This chapter 
will focus on Level 2. All discussion of MBus is assumed to be MBus Level 2 
unless explicitly marked as MBus Level 1. 


MBus Level 2 supports coherent cache multiprocessing. When the system 
consists of only caches and memory controllers designed for the MBus Level 
2 protocols, all caches are kept coherent on the bus. There are several 
compatibility requirements for coherent caches that must interoperate on 
MBus. See Subsection 17.4.4. 


The cache coherence protocol used in MBus is similar to the MOESI protocol 
used in Futurebus. This type of protocol differentiates between data that is 


shared and data thatis held exclusively. It also differentiates between data that 
is held unmodified and data that has been modified. 
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Two special signals on MBus Level 2 are used in the operation of the cache-co- 
herence protocol, MSH (shared) and MIR (inhibit). MSH is asserted by any 
cache that has the addressed data. MIH, when asserted on a read, inhibits the 
memory from returning data. A cache that has the data will supply it instead. 


Coherent transactions on MBus always have the transaction sizes set to 32 
bytes, the specified size of a cache sub-block. 
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17.2 SuperSPARC MBus Configurations 
Table 17-1 exemplifies several SuperSPARC design configurations. 
Table 17—1.SuperSPARC Chipset MBus Configurations 
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The direct MBus configuration uses the SuperSPARC microprocessor without 
the MXCC or extemal SRAMs. In this configuration the SuperSPARC is con- 
nected directly to MBus, as shown in Figure 17-1, and must run at the MBus 
clock frequency. 


Figure 17-1. Direct MBus MBus System 


SuperSPARC la 


The Full Module MBus system is diagrammed in Figure 17-2. The external 
cache memory provides significant performance improvement and greatly de- 
creases bus traffic in order to support more processors on a system bus. 


Figure 17-2. Full-Module MBus System 
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SuperSPARC MBus Configurations 


In anon-MBus system, SuperSPARC supports external system bus interfaces 
in its XBus configurations. See Chapter 19. 


External cache (E-cache) RAM may not be needed in every application. This 
configuration is listed as the “MXCC no SRAM” configuration in Table 17-1. 


The E-cache is organized as a direct-mapped copy-back cache with a size of 
1 Mbyte. Thís configuration is implemented with eight 128Kx8 or 128Kx9 syn- 
chronous SRAMs. The 128K x 9 SRAMs are needed to implement byte parity 
onthe E-cache data storage. Parity is directly supported by both SuperSPARC 
and the MXCC. 


Synchronous SRAMs have registers on each input and output. This allows 
pipelined operation. An address is presented to an SRAM before the active 
clock edge, and it is registered in the SRAM at the clock edge. The SRAM 
reads out the addressed location before the next active clock edge, and the 
result is stored in an output register at the next edge. New addresses can be 
supplied at each clock edge, and new outputs appear after two clock periods 
of delay. Writing works similarly with address, data, output enable, and write- 
enable being registered on the active clock edge and stored in the internal 
array during the subsequent clock period. 


The synchronous SRAMs used in the full module configurations are supplied 
by several manufacturers. 
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17.3 MBus Signals 


17.3.1 Physical Signal Summary 


Table 17-2 summarizes all the MBus physical signals. A bar over the signal 
name indicates the signal is active low (negative logic). A signal type of BS sig- 
nifies bi-state, TS signifies three-state, and OD signifies open-drain. 


Table 17—2.MBus Physical Signal Summary 












| Signa! | Description ^ [ Une type | Signar Type 
MAD[63:0] | Address/Control/Data 





PTY __| acion rety indicator | bussed | — TS — 
МЕРА [Ети [bussed | 18 | 
мн [шей [ыза | OD | 
ын [иы ^ [tee | TS 7 
[MER | Bus родо | dedicated | — BS | 
мб [Байт [ш 88 | 
| 4450] [empiLev | dedicaa | 88 | 
| бој | modue dentier | dedicaa | S5 | 
[ЖЕЙН | Ат Етол [шю | Об | 
[FSN [момент [тию | B5 | 
The MBus fieldnames are as foliows: 

MCLK MBus Master Clock. 


MAD[63:0] Memory Address and Data. During address phase, 
MAD[35:0] will carry the physical address, while 
MAD[63:36] will carry the information specific to the 
transaction requested, as described in Subsection 
17.3.2. During the data phase, MAD[63:0] contains 
data as shown below. Data must be aligned for trans- 
actions involving fewer than eight bytes (a double- 
word). For example, a word that has an even word 
address must be sent on MAD[63:32], whereas an 
odd-addressed word must be sent on МА0[31:0]. 
This is termed "big-endian" data order. 
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Figure 17-3. MBus Byte Alignment 





5 half-word 0 9:5 half-word 1 о: half-word 2 а: half-word 3 $ 
T word 0 o3 word 1 a 


MAS Memory Address Strobe. This signal is asserted by 
the bus master during the first cycle of a bus transac- 
tion. This cycle is referred to as the "address cycle." 
All other timing is relative to MAS. For example, A+3 
refers to the third cycle after MAS is asserted. 


MRDY MBus Ready. This signal is used as one of the three 
bits encoding the transaction status, as shown in 
Table 17-3. It only МАОУ is asserted, valid data has 
been transferred. The three status bits (MRDY, 
МАТУ, and MERR) are normally asserted by the ad- 
dressed slave. 


MRTY MBus Retry. This signal is used as one of the three 
bits encoding the transaction status, as shown in 
Table 17-3. If only MRTY is asserted, the slave 
wants the master to abort the current transaction im- 
mediately and start over. The master will relinquish 
the bus after this retry acknowledgement. 


МЕНН MBus Error. This signal provides one of the three bits 
that encode the transaction status, as shown in 
Table 17-3. The encoding with only MERR asserted 
indicates that a bus error has occurred. 


д 


Note: 


If any type of acknowledgement other than Valid Data Transfer is issued, the 
cycle in which itis issued is the last cycle of the transaction, regardless of how 
many more acknowledgement cycles would normally occur. 
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Table 17-3. Transaction Status Bit Encoding 












[ня || Penis нау — 
[не KT i NN 
и pr pee — 
и |н [ERRORI Bas or 
| [ енто Timeout __| 
LC [E Сн ERRORS Опсотедазе 





MBR MBus Request. This signal is asserted by an MBus 
master to request bus ownership. There is a unique 
MBR from each master to the MBus arbiter. 


MBG MBus Grant. This signal is asserted by the external 
MBus arbiter when the particular master is granted 
the bus. There is a unique MBG signal per master. 


MBB MBus Busy. This signal is asserted as an output dur- 
ing the entire transaction, from the initial assertion of 
MAS to the assertion of the last МАОУ or other ter- 
mination acknowledgement (such as an error ac- 
knowledgement). 


МЇН Memory inhibit. This is a level-2 MBus signal. МІН is 
asserted by the owner of a cache block after it re- 
ceives its address. This informs main memory that 
the current CR or CRI request should be ignored. 
See Section 17.8 for the system-dependent last МТА 
cycle. 


MSH Cache Block Shared. This is a level-2 MBus signal. 
Whenever a CR transaction appears, each module 
on the MBus should search its cache directory. If a 
valid copy is tound, MSH should be asserted. MSH 
timing is system-dependent; see Section 17.8. 
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IRL[3:0] 


1D[3:0] 


17.3.2 MBus Command Word Signals 


Module Reset Input. This signal should reset all logic 
on a MBus module to its initial state. At power-on, 
RSTIN must be held low for at least 100ms tor both 
SuperSPARC and the MXCC. See Chapter 13 for 
more specific details. 


Module Asynchronous Error Detect. This signal is as- 
serted by the module to indicate that an intemal error 
was detected. SuperSPARC will enter error mode 
when any exception is taken with traps disabled 
(PSR.ET = 0). Once in error mode, SuperSPARC will 
automatically perform a watchdog reset. AERR will 
remain asserted until the MFSR.EM bit is read and 
thus cleared. See Section 12.5. AEHR is also as- 
serted on a store buffer error. AERR remains as- 
serted until MFSR.SB is cleared. See Subsection 
10.6.5. 


Interrupt Request Level. These pins carry the inter- 
rupt request level to the processor. Each Super- 
SPARC module receives a dedicated set of IRL[3:0]. 
The processing of these signals is described in Sec- 
tion 12.7. 


Module identifier. These pins carry the module identi- 
fier. This identifier is asserted during the address 
phase of every transaction on MID[63:60]. The identi- 
fier is also used to identity a unique address range for 
module identification, initialization, and configura- 
tion. 


The MAD[63:00] signals contain multiplexed information. During the address 
phase of the MBus transaction, when MAS is asserted, the command word in- 
formation is presented by the master to the slave. During the data phase of the 
MBus transaction, the MAD[63:00] signals are used to transmit data. 
Figure 17-4 shows the format of the command word. The following section de- 


scribes these signals. 
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Figure 17-4. MBus Command Word Format 


| мо [0] meme ра айпа 


60 59 58 46 45 44 43 42 40 
35 0 
MID MBus Module Identifier—MAD[63:60]. This field is 


the ID[3:0] of the master for this transaction. The MID 
is sourced by all MBus modules and allows slaves to 
selectively reconnect after issuing a Relinquish and 
Retry. 


S Supervisor Access Indicator—MAD[59]. This signal 
is asserted to indicate that the MBus transaction was 
initiated by a processor in supervisor state. This is an 
advisory bit, not used by MBus transactions, possibly 


of use to the slave device. 

reserved Reserved. MAD(58:54] are driven high and not used 
at this time. 

VA Virtual Address bits 19 through 12—MAD[53:46]. 


This field is used by some processors with virtually in- 
dexed caches. This field is not used by the SSP опће 
MXCC, as they use physically addressed caches. 
These bits are driven high when the transaction origi- 
nates from an SSP or MXCC and are ignored by the 
SSP or MXCC when snooping or when addressed as 
a slave. 


M Boot Mode/Loca! Bus Indicator—MAD[45]. This sig- 
nal is asserted by the processor module during the 
address phase of boot mode transactions. The signal 
is also asserted during local bus transactions. This is 
an advisory bit and is not used by MBus transactions. 
When SuperSPARC is connected directly to MBus, 
this bit is used to indicate boot mode. When Super- 
SPARC is used with an the MXCC, the MXCC cannot 
know that SuperSPARC is in boot mode; the MXCC 
therefore never asserts this bit. 
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SIZE 


Table 17-4. SIZE Format 


TYPE 


MBus Signals 


Lock indicator—MAD[44]. This bit is an advisory bit 
to indicate resource locking. It is intended to lock a 
slave to a particular master. The bus itself is locked 
by maintaining MBB asserted during the entire oper- 
ation. SuperSPARC asserts the lock bit when per- 
forming atomic operations (SWAP or LDSTUB), or 
when performing a page table walk. This bit is set by 
the MXCC when performing a non-cached LDST 
transaction initiated from the VBus. Itis also set when 
the MXCC performs a CR or a CRI operation that has 
write-back operations pending or during any write 
back operation. 


Cacheabie indicator—MAD][43]. When This signal is 
asserted, it indicates data is cacheable. 


SIZE[2:0] —MAD][42:40]. Indicates the size of opera- 
tion in bytes. (See Table 17-4.) Neither MXCC nor 
SSP ever initiates transactions with SIZE » 32 bytes. 


Гат | SENT | sues | TRANSACTION SHE | 










[not supported in SSP or MXCC 


TYPE[3:0] - MAD[39:36]. Indicates the type of oper- 
ation as illustrated in Table 17-5. 
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Table 17—5. TYPE Format 


TRANSACTION TYPE 


Coherent invalidate (СІ) 





physical address This represents the 36-bit physical address on 
MAD[35:0]. 
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17.4 MBus Operation 


Level 1 


Level 2 


The MBus specification has two levels of compliance: Level 1 and Level 2. Lev- 
el 1 includes basic MBus signals and transactions needed to design a com- 
plete uniprocessor system. Level 2 introduces additional signals (two) and 
transactions needed to design a cache-coherent, shared-memory multipro- 
cessor. 


The level 1 MBus supports two transactions: read and write. These transac- 
tions simply read or write a specified size of data from a specified physical ad- 
dress. These transactions are supported using a subset of MBus sig- 
nals—namely, a 64-bit multiplexed address/data bus (MAD[63:0], an address 
strobe signal (MAS), and an encoded acknowledge on three signals (MRDY, 
MARTY, and МЕНН). Additional level 1 signals support arbitration for modules 
(NBA, МВС, and MBB), as well as interrupts (IRL[3:0]), reset (ASTIN and 
RSTOUT,, asynchronous errors (AERA), and module identification (ID[3:0]). 
The MBus reference clock (CLK) completes the signal requirements for a Lev- 
el 1 system. 


MBus assumes that there are central functional elements to perform reset, ar- 
bitration, interrupt distribution, time-out, and MBus clock generation. 


The level-2 MBus includes all Level 1 transactions and signals and adds four 
transactions and two signals to support cache coherency. This is to facilitate 
the design of shared-memory multi-processor systems. In Level 1, details of 
the caches inside modules are not visible to the MBus Transactions. This 
changes with Level 2, where many of the aspects of the caches are assumed 
as part of the new MBus transactions. To participate in cache-consistent shar- 
ing using Level 2 transactions, a cache must have a “write-back” policy, an 
*allocate" policy on write misses, and a block or sub-block size of 32 bytes. 
Cache lines are assumed to have at least five states (invalid, exclusive clean, 
exclusive dirty, shared clean, and shared dirty). 
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The additional transactions present in Level 2 systems are Coherent Read 
(CR), Coherent invalidate (CI), Coherent Read and Invalidate (CRI), and Co- 
herent Write and Invalidate (CWI). The two additional signals are shared 
(МУН) and inhibit (MIR). All coherent transactions have SIZE of 32 bytes, ex- 
cept for CI, which is not sized but which invalidates 32-byte cache blocks. The 
cache-coherency protocol is a ^write invalidate" protocol, where the cache be- 
ing written issues a Cl transaction if the line is not exclusive. This indicates to 
all caches that they should immediately invalidate the line, since it will contain 
“stale data" after the write completes. All caches “snoop” CR transactions and 
assert MSH if the address of the transaction is present in their cache. Observ- 
ing MSH, caches can update the state of the lines they hold. If a cache is the 
"owner," it asserts the signal МІН to tell memory not to send data and then sup- 
plies the data to the requesting cache. CRI and CWI are simply the combina- 
tion of a CI transaction with either а CR or Write transaction. Their purpose is 
to reduce the quantity of MBus transactions needed and thus conserve band- 
width. 


17.4.1 Timing of MSH, МІН, and MRDY 
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Different cache designs require differing numbers of cycles to complete а 
snoop request and act on it by asserting MSH and MIH. For proper operation, 
the memory controller must not acknowledge any data, while MIH might be as- 
serted on CR and CRI transactions. 


The memory controllers of systems designed to operate with several types of 
processors will need programmable timing for the minimum МАОУ response 
time for CR transactions. The minimum MRDY timing should be variable from 
A+2 to at least А+7 to accommodate various processor modules. The рго- 
grammable minimum should apply to all CR and CRI transactions. 


The SuperSPARC processor requires that MADY occur on or after the A+3 
cycle for all transactions in the direct MBus configuration. The MXCC requires 
that MADY occur on or after A+7 in asynchronous clocking configurations and 
A+5 in synchronous clocking configurations. 


At system startup, memory controller MADY timing should be set to match the 
minimum MRDY timing of system's slowest snooping cache. The timing of all 
coherent caches need not be the same within a system. 


The MBus specification allows for a minimum cyde time for a read transaction 
ot two cycles (during a read cycle with an error acknowledgment asserted 
where no data will be provided). Some processor modules will not behave cor- 
rectly with two cycle transactions. Proper operation with all processors can be 
assured by delaying the generation of MADY, MRT Y, and MEFR to A42 to be 
compatible with normal read cycle timing. SuperSPARC and the МХСС re- 
quire that all transactions end on or after A+2. 


MBus 
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17.4.2 Module ID 


17.4.3 Wrapping 


АН MBus slave interfaces, including SuperSPARC and the MXCC, accept an 
(ID[3:0]) input that is used as an aid to system configuration. As a Level 2 mas- 
ter, SuperSPARC and the MXCC also use the values on ID[3:0] as output 
(MADI63:0]) during the address phase of every transaction. The processor's 
module number is determined at reset by sampling the MID [3:0] pins. The 
MID[3:0] signals are connected to SuperSPARC address bus, pins 
ADDR[3:0]. The module number should be asserted statically by system hard- 
ware. The number may be any value except 0000. A module number of 0000 
references the reserved boot mode address space. 


A wrapped transaction transfers the requested word first and then the rest of 
the requested block in sequential order, wrapping from the last word of the 
block to the first. 


Wrapping concerns the order in which data will be delivered for muiti-cycle 
transfers. For MBus, Read (with transfer size of larger than eight bytes), CR, 
and CRI transactions are wrapped transactions. Wrapping implies that the first 
eight-byte double-word of data to be delivered is specified by the physical 
address bits during the address phase (MAD[35:3]). The rest of the requested 
sub-blockis to be delivered in sequential order. After delivering the last double- 
word of the sub-block, the next double-word delivered is wrapped to the first 
double-word of the sub-block. See Table 17-6. 


Table 17—6.Order of Wrapped Bytes 







|  BytesRetumedin — | Returned In 
“шт 
Сус!е Cycle Cycle сна 
[9 о pes [єз | за 
L3 pes теза | зен | от 
[3 qoem pes poer | ван 
5 es | ex | 85 [98 | 


All SuperSPARC-based modules will issue wrapped requests. Wrapping can- 
not usually be disabled in a processor. Therefore, memory controllers must 
support wrapped requests. 


A level 2 coherent cache that does not issue wrapped requests may not be 
able to deliver data to wrapped CR or CRI transactions when they assert МЇН. 
These caches cannot be operated as coherent caches in a system with pro- 
cessors that issue wrapped requests. 
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17.4.4 Cache-Consistency Protocol 


The level-2 MBus specifies a single-ownership, write-invalidate, cache-con- 
sistency policy. Only a single cache may own a modified copy of a cache line 
at any one time. Multiple shared copies of modified or clean data may exist 
within the system at any time. These multiple copies are invalidated any time 
the cache line is modified (the modified data may then be re-read from the own- 
er and cached as modified and shared information). This protocol is described 
in the MBus specification. 


During any CR transaction, all processors in the system will assert the MSA 
(Shared) signal if they currently have a cached copy of that data. Since several 
caches may be sharing the information, the MSH signal is open drain and can 
be asserted by all caches simultaneously. MSH must be pulled inactive (high) 
external to any processor (typically a resistive pull-up). MSH may be asserted 
at any time prior to the first data acknowledgment for the transaction. MSH 
must be asserted for at least one сусе. 


if acache owns the data for the transaction, it may intervene in a read transac- 
tion by asserting the MIH (memory inhibit) signal. Since there may be only one 
owner at а time for each cache line, only a single cache may assert МЇН. See 
Subsection 17.4.1 and Section 17.8 for information on the timing of the MSH 
and МЇН. 


Cache Consistency in Direct MBus Configuration 
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Once the SSP asserts MIH in response to a read (CR or CRI) transaction, it 
will complete the bus cycle by supplying data and a ready reply starting four 
cycles later. During these four cycles, the memory controller can drive other 
data and ready signals in response to the transaction for two cycles, followed 
by a third cycle where the bus is driven high. Any replies received during this 
time are ignored by the SSP. No exceptions may be reported after asserting 
МІН. Figure 17-5 represents the cache-consistency algorithm used by Super- 
SPARC's data cache on the MBus. 
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Figure 17-5. MBus Cache Consistency State Diagram for Data Cache 





The following symbols are used in the diagram: 

Prd X Local read 

Brd System bus read (CR) snoop hits 

Pwr Local write 

Bwr System bus invalidate (СІ, CRI or CWI) snoop hits 


Figure 17-6 represents а simpler consistency algorithm used by Super- 
SPARC's instruction cache on the MBus. 
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Figure 17—6. MBus Cache Consistency State Diagram For instruction Cache 


Local CR (MSH is don't care) 


snoop hit 


Bus CR snoop hit (assert MSH) 


17.4.5 MultiCache Controller Consistency іп Full Module Configuration 


MBus cache consistency in the full module configuration is similar to that of the 
direct MBus configuration. 


The ownership of a sub-block belongs to the external cache of the processor 
that last wrote to the sub-block. A sub-block remains owned until another 
cache attempts to write the sub-block or the sub-block is copied back to 
memory (usually when it is replaced in cache). 


Ownership is obtained or transferred from cache to another through write in- 
validation. When an SSP issues a write to a sub-blockthat 5 shared (5=1), the 
MXCC issues а (CI) transaction to MBus and assumes ownership of the 
sub-block. When the Cl transaction is received by other caches on MBus, they 
invalidate the copies that they have of the sub-block. Н the write is to a 
sub-block that is not present in the E-cache (write miss), ће MXCC issues a 
CRI transaction on the bus. If another cache owns the sub-block, it supplies 
the sub-block and inhibits memory from responding with the MIH signal. After 
а CRI transaction, the issuer owns the sub-block, and all other caches have 
invalidated any copies of the sub-block that they previously had. 


The cache-consistency state of a sub-block in the extemal cache may be 
changed by accesses from the SSP or from MBus. The relationship between 
the states and accesses from the local processor and MBus are shown in 
Figure 17-5. 
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17.5 SuperSPARC MBus Transactions (Generic) 


17.5.1 Read 


This section provides an outline for generic MBus transactions. Along with an 
explanation for each transaction, a timing diagram is provided to show the usu- 
al order signals are asserted in the transaction. Section 17.6 provides the de- 
tails specific to a SuperSPARC processor operating in stand-alone mode on 
the MBus. Section17.7 provides the details for a SuperSPARC process and 
the MXCC chip module on the MBus. 


Figure 17-7 shows a simple eight-byte read operation. The master drives ad- 
dress and status information on the MAD lines and asserts MAS for one cycle. 
The addressed slave supplies the data by driving the MAD lines and asserting 
the MRDY signal for one clock cycle for each 64-bit word transferred. 
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Figure 17-7. МВиѕ Read Cycle 
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1. Contro! lines (MAS, MADY, МЕНП, МИТҮ) are driven inactive for one clock 
betore being released 


2. MAD(nn) lines are held to their previously driven stale by system bus holders. 
3. MBB is driven high for 1/2 clock cycle before being released. 


Figure 17-8 shows a 32-byte read operation. А read operation can бе per- 
formed on any size of data transfer that is specified by the SIZE bits. Read 


transactions support wrapping (critical word first ordering). Transactions in- 
volving fewer than eight bytes will have undefined data on the unused bytes. 
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Figure 17-8. MBus Burst Read Cycle 
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17.5.2 Write 


Figure 17-9 shows a simple eight-byte write operation. The master drives ad- 
dress and status information on the MAD lines and asserts MÁS for one cycle. 
The Master then drives the data on these same lines, starting at the next cycle. 
The addressed slave signals completion by asserting the MRDY signal for one 
clock cycle for each eight bytes transferred. Figure 17-10 shows a 32-byte 
burst transfer. 


A write operation can be performed on any size of data transfer specified by 
the SIZE bits. Neither SuperSPARC nor the MXCC will issue a burst operation 
greater than 32 bytes. Write transactions involving fewer than eight bytes will 
have undefined data on the unused bytes. The writing master will immediately 
drive the data in the period after the address phase of the transaction and im- 
mediately after the receipt of each MRDY in transactions with SIZE greater 
than eight bytes. 
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17.5.3 Coherent Read 


SuperSPARC MBus Transactions (Generic) 


A CR operation is a block read transaction that maintains cache consistency. 
Coherent reads are performed on data intended to be cached in the SSP's in- 
ternal and external caches. Participants in the CR transaction are the request- 
ing cache, any other caches that snoop, and memory (or second-level cache). 
There are three possible read scenarios that the caches that snoop can experi- 
ence: 


9 


О 


For a snooping cache that does not have a copy of the requested block, 
it simply ignores the transaction. 


For a snooping cache that does have a copy of the requested block but 
does not own it, it simply asserts MSH for one cycle during the bus transac- 
tion (from time A«2 to time А+7). It will mark its copy as shared (if not al- 
ready marked as such). See Figure 17-11. 


For a snooping cache that owns the requested block, it will assert both 
MSR and МІН for one cycle during the bus transaction (іп any cycle from 
A+2 to A+7) and start supplying the requested data no sooner than four 
cycles after it issued МЇН. If its own copy of the block was labeled exclu- 
sive, it will be changed to shared; otherwise, no status change will take 
place for its copy. Figure 17-12 shows a cycle in which a cache owns the 
data and supplies it to the requesting master. 
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Figure 17-11.MBus Coherent Read of Shared Data 
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1. MSH may occur in any cycle from А+2 10 А+7. 
2. MSH is an open drain signal. It is not driven inactive. The System pull-up resistor returns it to a inactive level. 
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Figure 17-12. MBus Coherent Read of Owned Data 
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On receiving the data block, the requesting master will label the block exclu- 
sive if no MSH signal was asserted during the bus transaction. The earliest that 
a slave is allowed to issue MRDY for a CR operation is system-dependent (see 
Subsection 17.4.1 and Section 17.8). This ensures that acknowledgments 
never occur after МЇН (concurrent МЇН and acknowledgements are OK). 


17.5.4 Coherent Invalidate 


An Invalidate operation can only be performed on a block (32 bytes). All Invali- 
date operations will be snooped by all snooping caches. If an invalidate opera- 
tion hits in a cache, that copy will be invalidated immediately, regardless of its 
state. Memory is responsible for the acknowledgment of the CI transaction. 
This is accomplished on cycle A42 or later. All acknowledgment types except 
error acknowledgments are possible. Memory will only issue normal acknowi- 
edgments to Cl transactions, but third-level caches may issue other acknowl- 
edgments, particularly R&R. it should also be noted that a C! transaction will 
have SIZE = 32B during the address phase, but will only be expecting one 
MADY for the acknowledgment. If, in a particular system, caches cannot guar- 
antee to complete their invalidation before their А+2 cycle, the memory con- 
troller for that system should delay the acknowledgment as appropriate. 


CI MBus transactions are issued when a write is being performed into a shared 
cache block. Before the write can actually be performed, all the other systems 
caches must have their local copies invalidated (write-invalidate cache-con- 
sistency protocol). Snooping caches need not assert MSH. The MAD([63:00] 
lines will contain undefined data during the data phase cycles. Figure 17-13 
shows a Cl operation. 


If a Cl transaction should receive an R&R acknowledgment, there is a possibil- 
ity that the block that is about to be written could become invalidated by an in- 
tervening invalidation transaction on the bus. This means that, when the cache 
regains the bus, it should issue a CRI transaction, not a Cl transaction, to once 
again allocate the (sub-) block. Figure 17-13 shows a Cl operation. 
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Figure 17-13. MBus Coherent Invalidate Operation 
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17.5.5 Coherent Read and invalidate 


Since the MBus supports a write-invalidate type of cache-consistency proto- 
col, a special CRI transaction that combines a CR transaction with a Cl trans- 
action was included to reduce the number of MBus Coherent transactions. 
Caches that are performing CR transactions with the knowledge that they in- 
tend to immediately modify the data can issue this transaction. 


Each CRI transaction will be snooped by all system caches. If the address hits 
and the cache does not own the block, that cache will immediately invalidate 
its copy of this block, no matter what state the data was in. If the address hits 
and the cache owns the block, the block will assert МЇН and supply the data. 
When the data has been successfully supplied, the cache will then invalidate 
its copy of this block. 


MSH is not driven during the CRI transaction. 


17.5.6 Coherent Write and Invalidate 


A Coherent Write and Invalidate transaction combines a block write transac- 
tion with a Cl transaction. Figure 17-14 shows a Coherent Write and Invalidate 
operation. 
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Each Coherent Write and Invalidate transaction will be snooped by all system 
caches. If the address hits, caches will invalidate their copies of this block, no 
matter what state the data was in. Neither MIH nor MSH is asserted for Co- 
herent Write and Invalidate transactions. 


Figure 17-14. MBus Coherent Write and invalidate 
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17.5.7 Acknowledgment Cycles 


Any transaction, once issued, must correctly accept any acknowledgment 
type. The earliest that an acknowledgment will be issued is system-dependent 
(see Subsection 17.4.1 and Section 17.8). Caches that are supplying data as 
part of a CR transaction may only issue either normal or error acknowledge- 
ments. They may not issue R&R or Retry acknowledgements. Table 17-7 
shows the valid acknowledgements that a slave may issue. 
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Table 17—7.Error and Retry Handling 
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When there is no bus activity or when it is necessary to insert wait states be- 
tween the address cycle and the data cycle or between consecutive data 
cycles, an addressed slave can simply refrain from asserting any transaction 
acknowledgment types (MERR, МАО, and МЕТУ). The number of wait 
cycles that can be inserted is arbitrary, as long as the number does not exceed 
the system timeout interval. 


Relinquish & Retry (R&R) 
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When a slave device cannot accept or supply data immediately, it can perform 
a Relinquish and Retry (R&R) acknowledgment cycle by asserting MRTY for 
only one bus cycle. This will indicate to the requesting master that it should re- 
lease the bus immediately so that the bus can be re-arbitrated and possibly 
used by another master. This release of the bus will provide at least one dead 
cycle before the transaction can be retried. When a master that receives an 
R&R regains bus mastership, it must re-issue the same transaction from the 
beginning. An exception to this is when a CI tums into a CRI. R&R can only 
be issued on the first acknowledgement of a transaction. itis the responsibility 
of the slave port to time the duration of the transaction that is causing it to issue 
R&R and retum an ERROR2 acknowledgment to the correct master when its 
device-specific timeout interval has passed and the master has re-connected 
to the slave. 
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There are two different cases in which slaves issue R&R acknowledgments: 


О The first case is a slow device. If a device is slow to respond, the slave in- 
terface should wait a short interval (about one microsecond is recom- 
mended), and then issue ап R&R acknowiedgment. it should also capture 
the ID of the master (MID[3:0]) from the MAD lines during the address 
phase and enter a "port busy" state while waiting for the device attached 
to the slave to respond. The master will eventually reconnect, and the R&R 
process will be repeated until either the device responds or the slave time- 
out interval is exceeded. The slave will then issue the normal or error ac- 
knowledgment, respectively, and exit the "port busy" state. 


If amaster with an ID other than that captured by the slave port should ac- 
cess the slave port while it is in the “port busy" state, it should simply be 
given an R&R acknowledgment. 


O The second case for R&R acknowledgments is the resolution of deadlock 
situations where there is master and slave port sharing an MBus interface 
and simultaneous transactions on both ports require one transaction to 
back off. R&R requires that the current owner of MBus relinquish owner- 
ship ín order to resolve the deadlock. R&Rs used to resolve deadlocks аге 
inherently stateless and do not require a “port busy" state. 


A detail of significance is that R&R can be issued to a transaction that is part 
of a locked sequence of transactions. By definition, transactions in a locked 
sequence are addressed to the same device (e.g., main memory (or second- 
level cache) or an I/O adapter. There is only one “port busy" state per device, 
50 there is only one source of R&R for a locked sequence. 


It should be noted that caches that assert MIH and then supply data are not 
permitted to issue R&R acknowledgements. Figure 17-15 and Figure 17-16 
show R&R acknowledgments. 
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Figure 17-15. Relinquish and Retry (Parked) 
CLK 
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Figure 17-16. Relínquish and Retry (Rearbitrate). 
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Valid Data Transfer 


A valid or ready data transfer is indicated by a responding slave with the asser- 
tion of the MRDY transaction status bit. 


Errori (Bus Error) 


When the responding device asserts only the MERR transaction status bit, the 
requesting master will interpret this as an ежетај bus error. This error code 
can also be used to indicate the same system implementation dependent er- 
ror. Bus error is the suggested interpretation of an ERROR1 acknowledgment. 


ERROR? (Timeout) 


This acknowledgment is expected to be generated by some sort of watchdog 
logic in the system, after a set number of cycles have elapsed with the MBB 
signal asserted. The number of cycles for the timeout interval is system imple- 
mentation-dependent. An interval of 200 microseconds is recommended. This 
error code can also be used 10 indicate a system implementation-dependent 
error. Timeout is the suggested interpretation of an ERROR2 acknowledg- 
ment. 
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ERRORS (Uncorrectable) 


Retry 
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This acknowledgment is used mainly by the addressed memory controller to 
inform the requester that, in the process of accessing the data, some sort of 
uncorrectable error has been encountered (such as parity, uncorrectable 
ЕСС, etc.). This error code can also be used to indicate a system implementa- 
tion-dependent error. Uncorrectable error is the suggested interpretation of an 
ERRORS acknowledgment. 


This acknowledgment causes the master to restart the transaction from the 
MAS cydle without releasing the bus for re-arbitration. This differs from Relin- 


quish and Retry in that it does not allow any other master to access the bus 
before the retry. ў 


eo ——-—————- 


Note: 


This acknowledgment should not be used with MXCC. See Subsection 
17.6.9. 
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17.6 MBus for SuperSPARC Processor Without the MXCC 


17.6.1 Compatibility 


The following section deals with information specific to a SuperSPARC pro- 
cessor connected directly to the MBus with no MXCC (the direct MBus configu- 
ration in Table 17-1 and Figure 17-1). For details on MBus behavior of the Su- 
perSPARC Processor with the MXCC, see Section 17.7. 


The SSP can interoperate on MBus with some other MBus processor mod- 
ules, but not all. in particular, modules that require the virtual address superset 
bits will not be compatible. The SSP responds properly to all standard MBus 
transactions, though it does not produce all of them. The only size of burst 
transactions supported by the SSP are 32-byte wrapped bursts. All cache- 
consistent transactions operate on 32-byte sub-blocks. АП responses on MSH 
and МЇН are provided in the A+3 cycle (the third cycle after assertion of ad- 
dress on the bus). SuperSPARC can operate in a system with variable re- 
sponse times, up to A+7. MBus systems should be designed to accept these 
timing variations. Even though SuperSPARC may be protocol-compatible with 
some modules, electrical and signal timing differences may make certain mod- 
ules incompatible. See the SuperSPARC data sheet for pin-timing details. 


17.6.2 Selecting MBus 


17.6.3 Port Register 


MBus operation is selected by driving the CCMODE pin inactive. This signal 
should be driven statically by the system. Any change in the state of this signal 
after the deassertion of RESET will result in unpredictable operation. Software 
may determine the state of COMODE, and the direct MBus interface is there- 
fore selected by reading the MCNTL.mb bit. 


The MBus port register contains vendor identification information for the com- 
ponent. Processors respond to Read transactions to an address that is based 
on the current module ID value. See the MBus specification for further details. 
SuperSPARC will return the value 0x00000004 on MAD[31:0] in response to 
any read of its port register. This indicates device 0, revision 0, for the vendor 
(Texas Instruments). The version number may change in future releases of the 
component. 


17.6.4 SuperSPARC Processor as MBus Master 


The Read transaction is used for read operations of both code and data that 
will not be snooped by other caches (non-cacheable). All non-cacheable reads 
will cause the store buffer to drain before the transaction begins. 
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Write 
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Only single-cycle (64 bits maximum) transfers will be used. SuperSPARC 
does not burst transfer reads. The transaction may be byte, half-word, word, 
or double-word in length. Big-endian word ordering is used (byte 0 appears on 
the high order bits of the bus). 


The A+2 cycleis the minimum cycle that SuperSPARC can accept ап acknowl- 
edgement for a Read. 


Non-cacheable stores are queued in SuperSPARC’s store buffer, when it is 
enabled (MCNTL.SB), and SuperSPARC does not wait for the stores to com- 
plete. 


As soon as SuperSPARC receives а bus grant, the queued non-cacheable 
store is issued to the bus as а write transaction. Burst writes are not used for 
non-cacheable writes. Write transactions may be byte, half-word, word, or 
double-word in size. Any errors on the write transaction are reported as def- 
erred data store errors. For synchronization, any cacheable store waits until 
the store buffer has completed all non-cacheable stores before beginning. 


The A+2 cycle is the earliest cycle on which SuperSPARC can accept an ac- 
knowledgement for a Write. 


puueee————————— | 
Note: Atomic Transactions 


Non-cacheable atomic transactions are caused by execution of either SWAP 
or LDSTUB instructions. These operations are performed as a locked se- 
quence of read and write transactions. The lock bit field inthe address phase 
of the read and write transactions will be set. The SSP will not release the 
bus between the two transactions unless explicitly requested by a relinquish 
& retry (R&R) reply. R&R replies are accepted for both read and write por- 
tions of atomic transactions. 





Note: Page-Table Walk Operations 


All page tables should be treated as non-cacheable in SuperSPARC direct 
MBus systems. As a result, only Level-1 transactions are used to reference 
page tables. During a table walk, SuperSPARC maintains bus ownership, 
unless explicitly told to release by an R&R response. The lock bit in the MAD 
field is set. All levels of the page table are read with standard single-word 
read transactions. Updates to R and M bits, or just M bits, are done with stan- 
dard write operations. Updates to the R bit only will be done with non-cache- 
able atomic swaps (see non-cacheable atomic references, above). This 
mechanism eliminates potential status bit inconsistencies when multiple 
MMUs are attempting to update page tables simultaneously. See Section 
9.4. 
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CR transactions are used to read data from the current owner. The owner may 
be main memory or another cache. CR transactions will be used for all data 
cache load misses and for all instruction cache misses. If the data is owned 
by another cache, it will respond by asserting MIH and providing the data. 


All CR transactions use wrapping (see Subsection 17.4.3), and the starting ad- 
dress specified is the first double-word transferred. Once the needed data ar- 
rives, the processor uses it immediately. 


Any processor or cache that has a valid cached copy of the data referenced 
by the CR transaction must assert the MSH signal to indicate that data is 
shared. SuperSPARC can accept the assertion of the MSH signal at any time 
up to or concurrent with the receipt of the first data word. 


If the data is owned by another cache, that cache asserts MIH. The SSP ig- 
nores any data-ready (MRDY) responses until four cycles beyond the asser- 
tion of MIH. This allows memory controllers to begin transmitting data sooner 
than they might have otherwise. Memory controllers should not respond with 
data until a time equal to the longest MIH delay of any cache in the system. 


The SSP can accept МЇН upto and concurrent with the first MADY. The earliest 
that an acknowledgement can be accepted is at A«2, as long as there has 
been no МЇН. If MIF is asserted by any cache after MADY has been asserted, 
the resulting behavior will be unpredictable. 


Coherent Head and Invalidate 


CRI transactions are used to read data from the current owner for write-allo- 
cate operations. Use of these transactions implies that the data will be modi- 
fied upon arrival at the processor. The current owner should relinquish owner- 
ship, and all copies of the cache block should be invalidated. After the transac- 
tion completes, the SSP issuing the CRI transaction is the exclusive owner of 
the cache block. 


CRI behaves similarly to the CR transaction with respect to wrapping and tim- 
ing issues. 


Соћегет Write and Invalidate 


The SSP does not issue the CWI transaction. MBus does not provide any true 
Coherent Write (CW) transactions. Due to the write-allocate scheme used by 
the SSP's caches, a processor must own data before it may write to it. As a 
result, all cacheable write operations are done locally within a processor 
cache. The Write transaction is used for all non-cacheable stores and for 
cache copy-back operations. The SSP will relinquish ownership of a cache 
block after a copy-back operation (that uses a Write transaction). 
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Coherent invalidate 


The SSP requests only a single type of pure-consistency operation (opera- 
tions that transfer no data but affect the state of a cached block). This is a CI 
transaction. A СІ transaction will invalidate all cached copies of a block in the 
system. This is used when the processor attempts a store to a shared line, re- 
gardless of ownership. After the СІ transaction, the issuing processor be- 
comes the exclusive owner of the cache block. The processor waits for proper 
completion of the Cl transaction before performing the internal write. 


The SSP expects an acknowledgement, опе is sent, at A+2. The processor 
can in fact accept an acknowledgement at A+1, but systems normally cannot 
generate acknowledgements that quickly. 


17.6.5 SuperSPARC as a Snooping Cache Directly on MBus 


Coherent Read 


The SSP maintains cache-consistency by snooping on the MBus. The non- 
cacheable read and write transactions are ignored by SuperSPARC as a 
cache snooper. Therefore, the relevant transactions are all coherent transac- 
tions. 


The SSP responds with the proper MSH and MIH signals in response to a CR 
transaction on the bus. These signals are driven in the A+3 cycle. 


If the SSP owns the cache block, it waits an additional four cycles after assert- 
ing MIH, until А+7, then responds with the data from its internal data cache. 
The MSH pin is driven as an open drain signal. It is only driven low, and it must 
be pulled inactive externally. Anytime MIH is driven, it is driven low for a cycle, 
then driven high again for a cycle; the driver then retums to high impedance. 


Coherent Read and Invalidate 
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The SSP responds to a CRI transaction snoop hit in a manner similar to the 
CR transaction. The timing remains the same as that for a CR transaction 
snoop hit (A+3 for МЇН and A+7 for acknowledgement and data), but MSH will 
never be asserted in a CRI transaction. MIH will be asserted if the processor 
owns the data. 
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Note: 


Four cycles are required between asserting of МЇН and MRDY for CR and 
CRI transactions because of the following timing: 

cycle 1: Owner asserts MIH, Memory Controller asserts МАОУ 

cycle 2: Memory Controller asserts 2nd МАРУ and “sees” the Owner's МІН 
cycle 3: Memory Controller starts disable 

cycle 4: Three-state 





Coherent Write and invalidate 


Coherent Invalidate 


CWI is a combined transaction that has the effect of a Write transaction fol- 
lowed immediately by a C! transaction. Since write transactions are non-co- 
herent, the SSP does not snoop these operations. Coherent Writes and Invali- 
date transactions are treated as Cl transactions. The SSP can accept these 
every other bus cycle and no acknowledgement is required. 


The SSP treats the two pure consistency operations, where no data is re- 
quested, as simple invalidations. The two pure consistency transactions are 
Ci and CWI. Since no reply from the processor is required, SuperSPARC can 
accept CI transactions at maximum bus rate, every other cycle. The rate of 
these transactions is controlled by system logic and will generally be slower 
than two cycles per transaction. 


17.6.6 SuperSPARC Processor as MBus Slave 


Read 


The SSP behaves as a slave only when its port register is read. Only Read 
transactions are supported to the port register. All Write, CR, CRI, CWI, and 
Cl transactions are ignored by an SSP slave. 


MBus port registers may be read using non-cacheable accesses to physical 
addresses in the range Oxff100000-»0xfff00000, depending on the module 
number being addressed. In this case, the SSP will behave as a slave and re- 
spond in minimum of A+3 cycles. 


It is legal for a processor to address its own port register. This is the only case 
of snooping on non-coherent read transactions. In this case, the processor in 
fact acts as both master and slave on the same transaction. 


17.6.7 Store Buffer Operation in Direct MBus Configuration 


The SSP's store buffer will be used to buffer all copy-back data, as well as 
non-cacheable stores in direct MBus configuration. Any errors on these trans- 
actions will be reported as deferred data store errors. 
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The SSP's store buffer maintains consistency. During copy-back operations, 
the store buffer becomes the owner of a cache block. The store buffer snoops 
all coherent operations and responds appropriately with the status of the 
block. The contents of the copy-back buffer are always owned. Data is always 
returned in critical word first order. Table 17-8 describes the actions taken if 
a coherent transaction hits copy-back data in the SSP's store buffer. 


Table 17—8.Store Buffer Copy-Back Snoop Hit Actions 






ЕЕ 
|___ св | МТА, МЕН, supply data, continue copy-back 






MIA, supply data, cancel copy-back 


17.6.8 Bus Arbitration 
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The SSP requests the use of the MBus with the MBR signal. This signal is as- 
serted when any internal bus cycle is pending. Since the SSP issues some 
transactions speculatively, the request for a particular transaction may disap- 
pear after the bus has been requested. If this occurs, the MBR signal is deas- 
serted immediately. К the interna! request is still valid when the MBus is 
granted, the transaction occurs. The SSP attempts to overlap arbitration with 
current bus cycles (including its own bus cycles). A processor is granted future 
use of the bus by receiving the MBG signal. When MBG is received, the bus 
may still be busy servicing the previous owner. This is indicated by the MBB 
(MBus busy) signal. The MBR signal is deasserted as soon as MBG is as- 
serted. In order to gain ownership of the bus, the processor waits for its MBG 
signal to be active and the MBB to be deasserted. Once this occurs, it asserts 
MBB. After arbitration is complete, the processor asserts MAS to begin the 
transaction. Bus busy is maintained during locked and atomic transactions 
(like table walks). ifthe SSP already owns the bus, there is no arbitration delay 
to begin subsequent transactions. 
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17.6.9 Error and Retry Handling 


Several different transaction responses are defined by the MBus. Successful 
bus cycles are terminated with the valid data transfer acknowledgment 
(MRDY). Unsuccessful transactions may be terminated with several different 
error responses, or retries. The SSP supports all error responses but only one 
of the retry responses. All error responses are applied to the current data 
transfer only. Any data received with a valid data transfer response will be as- 
sumed correct and used internally. If any errors occur during the transfer of a 
cache block, the internal cache will not be validated. All data retumed prior to 
the error may be used intemally. See Table 17-9. 


Table 17–9. Error and Retry Handling 











[не A | Vali Data Transter — | 
“н ___-__|_____| RESERVED 





|н — Luncerectabls Eror (ERRORS) 


1 Guaranteed by design but not tested. 
Errors 


Three error responses are defined. These are ERROR! (Bus Error), ERROR2 
(Timeout), and ERRORS (Uncorrectable). The SSP treats all these errors in 
the same manner and sets the MFSR to indicate the exact error response, as 
shown in Table 17-10. 


Table 17—10. MFSR Response to Bus Errors 






[ERROR BUS ERRO | WFORBE | 










Asynchronous Errors 


The SSP asserts the AERR signal whenever the processor enters error mode 
or whenever an exception occurs during a store buffer copy-out. АЕНН will re- 
main asserted until the MFSR register is cleared of the exception (it is cleared 
on read). 
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Relinquish 8 Retry (R&R) 


Retry 
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The SSP supports (R&R) replies only on the first response to any transaction, 
including copy-back operations. After receiving an R&R, SuperSPARC will re- 
lease bus ownership and attempt to retry the transaction. 


SuperSPARC does not support the retry reply. All retries will be returned in the 
equivalent of ERRORS replies. SuperSPARC expects that all data returned to 
the processor with a valid data transfer (MRDY)READY response is truly cor- 
rect. This data is used immediately by the pipeline. SuperSPARC cannot toler- 
ate late error responses under any circumstances. 
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17.7 SuperSPARC Processor With the MultiCache Controller on MBus 


The following section deals with information specificto the SSP and the MXCC 
module connected to the MBus. This is the Full Module MBus configuration in 
Table 17-1 and Figure 17-2. For details on SuperSPARC’s behavior when it 
is connected to the MBus without the MXCC, see Section 17.6. 


17.7.1 E-Cache 


The MXCC controls SuperSPARC's extemal cache memory (E-cache). 
E-cache is a direct-mapped and copy-back cache, 1M-byte in size in MBus 
configurations. in MBus configuration, the sub-block size is 32 bytes. There 
are four sub-blocks per cache block, each block thus containing 128 bytes. 


E-cacheis a unified cache; it combines instructions and data in a single cache. 
E-cache maintains the inclusion property with respect to the SSP's on-chip 
data and instruction caches; every cache block in either of the SSP's caches 
is also in E-cache. 


Should the MXCC be connected to E-cache, startup software should initialize 
the cache before enabling it by setting the cache enable (CE) bitinthe MXCC's 
configuration register. When the E-cache is disabled, MSH and MIH are al- 
ways deasserted, and normal cacheable accesses bypass the E-cache. In this 
case, the MXCC passes CI, CRI, and CWI transaction snoop requests on to 
the SSP to enable the processor to invalidate its on-chip cache. 


17.7.2 Selecting MBus 


MBus operation is selected by driving the MBSEL pin on MXCC high. This sig- 
nal should be driven statically by the system. The SSP's CCMODE pin must 
be held low to select the processor's VBus interíace to MXCC. Any change in 
the state of these signals during the operation of the devices will result in un- 
predictable operation. 


17.7.3 Port Register 


The MBus port register contains vendor identification information for the com- 
ponent. Processors respond to read transactions to an address that is based 
on the current module ID value. See the MBus specification for further details. 
The MXCC returns the value 0x00000104 on MAD[31:0] in response to any 
read of its port register. This indicates device 1, revision 0 for the vendor Texas 
Instruments (vendor number 4). The version number may change in future re- 
leases of the component. 
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17.7.4 Synchronous and Asynchronous Clocking 


The MXCC can be clocked in two different fashions. Asynchronous clocking 
results if PCLK (clock on the processor side of the MXCC) is faster than BCLK 
(clock on the bus side of the MXCC). Due to the design of the internal synchro- 
nizers, PCLK must, in asynchronous mode, be at least 300 KHz faster than 
BCLK, and the ratio of POLK to BCLK must not exceed 3 to 1. 


In synchronous mode, PCLK and BCLK must be connected to the same clock, 
andthere should be very little skew betweenthe two clock inputs (no more than 
150ps). See Section 23.3. | 


17.7.5 МХСС Master on MBus 


Read 


Write 


17-42 


Thereadtransaction is used for read operations of both code and data that will 
not be snooped by other caches (non-cacheable). Reads can be performed 
on any size of data transfer, as specified by the SIZE bits. The MXCC will issue 
no burst operation greater than 32 bytes. Read transactions involving fewer 
than eight bytes will have undefined data in the unused bytes. Big-endian word 
ordering is used (the least significant bytes in a word appear on the high bits 
of the bus). 


The A42 cycle is the minimum cycle in which the MXCC can accept an ac- 
knowledgement for а read. 


Writes are snooped (non-cacheable) by other caches. The MXCC issues no 
burst write transaction greater than 32 bytes. Bytes, half-words, words, and 
double-words may all be written, with big-endian ordering. Write transactions 
involving fewer than eight bytes will have undefined data in the unused bytes. 


The A+1 cycle is the minimum cycle in which the MXCC can accept an ac- 
knowledgement for a Write. 


pe———————————— aaa 
Note: Atomic Transactions 


Non-cacheabie atomictransactions are caused by execution of either SWAP 
or LDSTUB instructions on non-cacheabie data. These operations are per- 
formed as a locked sequence of Read and Write transactions. The lock bit 
field in the address phase of the read and write transactions will be set. The 
MXCC will not release the bus between the two transactions unless explicitly 
requested by an R&R) reply. R&R replies are accepted for both read and 
write portions of atomic transactions. 





MBus 


Subject to Change oer Notice 





Coherent Reads 


SuperSPARC Processor With the Multicache Controller on MBus 


CRitransactions are used to read data from the current owner. The owner may 
be main memory or another cache. The participants in a CR transaction are 
the requesting cache, the snooping caches, and memory. If the data is owned 
by another cache, it will respond by asserting МЇН and providing the data. CR 
transactions are used by the block copy logicto ensure coherency during block 
copies and by the E-cache controller logic to read data into the E-cache. 


All CR transactions use wrapping (see Subsection 17.4.3), and the starting ad- 
dress specified is the first data transferred. Once the needed data arrives, the 
processor will use it immediately. 


Any processor or cache that has a valid cached copy of the data referenced 
by the CR transaction asserts the MSH signal to indicate that the data is 
shared. With either asynchronous or synchronous clocking, the MXCC can ac- 
cept the assertion of the MSH signal at any time up to and concurrent with the 
receipt of the first data word. 


If the data is owned by another cache, the MXCC in either clocking mode ig- 
nores any data-ready (MRDY) responses from MIH two cycles beyond the 
assertion of MIH. Memory controllers should not respond with data until a time 
equal to the maximum MIH delay for any cache in the system. 


The MXCC accepts МЇН up to and concurrent with the first MRD Y. МІН should 
never be asserted after MRDY by any cache, as the resulting behavior will be 
unpredictable. 


Coherent Read and invalidate 


CRI transactions read data from the current owner. Use of these transactions 
implies that the data will be modified upon arrival at the MXCC. The current 
owner should relinquish ownership, and all copies of the cache line should be 
invalidated. After the transaction completes, the MXCC should be the exclu- 
sive owner of the cache line. 


CRI transactions behave similarly to CR transactions with respect to wrapping 
and timing issues. 
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Coherent Write and Invalidate 


Coherent invalidate 


CWI transactions combine a block write with a Cl. Each CWI will be snooped 
by all caches. If the Address hits, the caches invalidate their copies of this 
block, no matter what state the data was in. Coherent Write and invalidate 
transactions behave like a Write, except that snooping caches invalidate co- 
pies of the block. All sizes are allowed, but only a single 32-byte block is invali- 
dated, regardiess of the SIZE specified. Due to the nature of the MBus coher- 
ency protocol, neither МЇН nor MSH can be asserted. The MXCC can accept 
the first acknowledgement for a СМЛ transaction in the A+2 cycle. 


CWltransactions are used by the МХСС'$ block copy logic to ensure coheren- 
cy during block copies and block zeros. 


The MXCC uses the Cl as a pure-consistency operation (operations that trans- 
fer no data but affect the state of a cached block). A Cl operation invalidates 
all cached copies of a block in the system. This is used when the MXCC at- 
tempts a store to a shared block, regardiess of ownership. After the Cl opera- 
tion, the MXCC becomes the exclusive owner of the cache block. 


The МХСС expects an acknowledgement at А+2. An acknowledgement for а 
Cl operation is sent by the memory that holds the data. The caches that hold 
copies do not acknowledge a Cl or а СУЛ operation. 


17.7.6 MXCC as a Snooping Cache on MBus 


Coherent Read 
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The MXCC maintains cache consistency by snooping on the MBus. The non- 
cacheable read and write transactions are ignored by the MXCC as a cache 
snooper. The relevant transactions are thus all coherent transactions. 


The MXCC responds with the proper MSH and МЇН signals to cache hits on 
CR transactions on the bus. 


With asynchronous clocking, the MXCC asserts MIH and/or MSH between 
A+5 and А+7, inclusive. Should the MXCC own the cache block, the acknowi- 
edgement and data сал come anywhere from А+14 (best case) to А+ 23 (worst 
case). If no МІН is asserted, the acknowledgement and data come in cycle А+7 
or iater. 


For synchronous clocking, MIH and/or MSH are asserted at A+5. If the MXCC 
owns the cache block, the acknowledgement and data can come anywhere 
from A+16 (best case) to A+ 23 (worst case). If no MIH is asserted, the ac- 
knowledgement and data come in cycle А+5 or later. 
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Coherent Read and Invalidate 


The MXCC responds to a CRI transaction snoop hit very similarly to a CR 
transaction snoop hit. The timing remains the same as that of a CR transaction 
snoop hit (for both asynchronous and synchronous clocking), but MSH will 
never be asserted in a CRI transaction. MIH will be asserted if the processor 
owns the data. 


Coherent Write and Invalidate 


Coherent invalidate 


CWI transactions are snooped by all system caches. Should an address hit, 
the data is invalidated, no matter what state it was in. While CWIs are allowed 
to be of any size, only a single 32-byte block will be invalidated. 


The MXCC does not acknowledge CWI. The MXCC can respond to at most 
one CWI every fourth cyclo. 


The MXCC treats all pure-consistency operations in which по data is re- 
quested as simple invalidations. This includes Cl and CW! transactions. These 
invalidations can only be performed on a sub-block (32-bytes) basis. 


The МХСС does not acknowledge СІ. The MXCC can respond to at most one 
Cl every fourth cycle. 


17.7.7 MXCC Slave on MBus 


Read 


The MXCC contains a number of registers that control the operation of the 
E-cache, bus, or other MXCC functions, or sense the MXCC status. The 
MXCC behaves as a slave when these registers are read or written trom MBus. 
All CR, CRI, CWI, and CI transactions are ignored by the MXCC slave—only 
Read and Write transactions are supported by the MXCC as a slave. 


The MXCC assumes that accesses to these registers are the same as the reg- 
ister size. The MBus SIZE bits are ignored. During a read transaction, the con- 
tents of a 32-bit register are driven onto MAD[31:0], a 16-bit register drives its 
contents onto MAD[15:0], and a 64-bit register drives MAD[63:0]. The unused 
bits retum unpredictable data on a Read. 


The MXCC responds to a Read in the A49 cycle to the A423 cycle at the latest, 
depending on the register read. 


17-45 


Subject to Change Without Notice 


SuperSPARC Processor With the Multicache Controller on MBus 


Write 


The MXCC assumes that Writes to its internal registers are the same as the 
register size. The MBus size bits are ignored. During a Write transaction, the 
contents of a 32-bit register should be driven on MAD[31:0], a 16-bit register 
on MAD[15:0], and a 64-bit register driven on MAD[63:0]. The unused bits are 
ignored on a Write from the MBus. 


The МХСС responds to a Write in the A+9 cycle to the A423 cycle at the latest, 

depending on the register written. 

ука 
Note: 


The atomic load-store operation to any of the registers is not supported. A 
timeout error will be reported should an atomic load-store be attempted. 
LLÁ——————M————M—M—————————— 


17.7.8 Bus Arbitration 


MBus arbitration is accomplished by an extemal arbiter. The actual arbitration 
algorithm depends on the implementation of the external arbiter. The MXCC 
asserts MBR when it determines that it requires the MBus. И releases MBA 
immediately after receiving MBG. MBG remains asserted until MBB is ne- 
gated. The MXCC will normally release the MBB signal at the termination of 
the cycle (final acknowledgment); after an error acknowledgment, however, 
МВВ remains asserted for a number of cycles while ће MXCC completes its 
internal error processing. 


17.7.9 Error and Retry Handling 
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Several different transaction responses are defined by the MBus. Successful 
bus cycles are terminated with the valid data transter acknowledgment 
(MRDY). Unsuccessful transactions may be terminated with several different 
error responses, or retries. As a siave or snooping cache, ће MXCC does not 
drive the MRTY signal and can therefore issue only valid data transfer. bus er- 
ror, and uncorrectable error acknowledgements. As a master, the MXCC de- 
codes acknowledgments according to Table 17-11. 
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Гоу | мну [Description | 
PH | H [moe _______ 
[ H | © | Relinguish and Rety | 
| L | н f Valid Data Transfer 

н [ M [Bus Eror ERROR — 
|_н | E  [WmextEmo(ERRORZ | 
Ге | M [Uneoreciale Eror (ERRORS) 





Errors 
Three error responses are defined. These are ERROR1 (Bus Error), ERROR2 
(Timeout), and ERRORS (Uncorrectable). The MXCC master logs these er- 
rors in the MXCC error register (CCER) (see Subsection 16.8.1) and, if ap- 
propriate, notifies the SSP via a VBus error acknowledgment. 
Asynchronous Errors 


The MXCC asserts the AERR signal whenever an asynchronous error occurs. 
Asynchronous errors include errors of operation that the SSP has already ac- 
knowledged to the MXCC, but have errors occurring later in the operation. 
These errors are logged to the МХСС5 error register (see Subsection 16.8.4) 
before they are reported to the system by asserting АЕЋА. 


Relinquish and Retry (R&R) 


Retry 


The MXCC supports R&R replies only on the first response to any transaction. 
Some caches, including the MXCC's E-cache, can present a problem when 
they receive an R&R acknowledgement. These caches no longer recognize 
the data block as owned when backed off a write-back of a cache block with 
an R&R. Other system logic must be responsible for ensuring that no other co- 
herent transaction occurs to that block until the write succeeds. This problem 
arises due to the fact thatthe MXCC does not snoop the write-back buffer. The 
SSP in Mbus mode does not have this problem, since it snoops its store buffer. 


The MXCC does not support the retry reply. All retries are the equivalent of ER- 
RORS replies. The MXCC expects all data returned with a READY response 
to be truly correct. This data is used immediately (e.g., by storing in the E- 
cache or forwarding to the processor). 
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17.8 MBus Timing Summary 


MBus Specification Rev 1.2 
Read А+2 (min) for MADY 
Write A42 (min) - A+3 for different masters performing back to back writes for MADY 
Coherent Read A+2 for МЇН and/or MSH 
A+6 for MADY 


A+2 for MADY if МІН is not asserted 
Coherent Read and Invalidate 
A+2 for MIH 
No number specified for MADY, 
Need to know when invalidation occurs. 
Coherent Write and Invalidate 
A+2 (min) - A+3 for different masters performing backto back writes for MRDY 
Need to know when invalidation occurs 
Coherent Invalidate A42 for MADY 


SuperSPARC Master Directly on MBus 
SuperSPARC asserts address, accepts an acknowledgement, accepts MIH, MSH, data 


Read A+2 (min) for acknowledgement 
Write A+2 (min) for acknowledgement 
Coherent Read Cannot accept MIH after first MRDY (concurrent is acceptable) 


4 cycles (min) after MIH for acknowledgement and data 

A+2 (min) for acknowledgement if no MIH 
Coherent Read and invalidate 

Cannot accept МЇН after first MADY (concurrent is acceptable) 

4 cycles (min) after MIH for acknowledgement and data 
Coherent Write and Invalidate 

Not issued by SuperSPARC due to write allocate cache scheme 
Coherent Invalidate 

A+2 (min) for acknowledgement if any is sent 

Note: SuperSPARC can accept at A+1, but systems normally cannot generate 
acks that fast. 


SuperSPARC as a Snooping Cache Directly on MBus 

SuperSPARC accepts address, asserts acknowledgement, asserts МІН, МН, and sends data 
Read Not valid 

Write Not valid 


Coherent Read A+3 tor MIH and/or MSH (non-varying) 
A+7 for acknowledgement (non-varying) 


Coherent Read and Invalidate 
A«3 for МЇН and/or MSH (non-varying) 
А+7 for acknowledgement (non-varying) 
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Coherent Write and Invalidate 

Treated as Coherent invalidate 

A41 (SuperSPARC will not assert MADY) 
Coherent Invalidate 

A41 (No acknowledgement sent out) 


SuperSPARC Slave Directly on MBus 


Read of Port Register 
Read A43 (min) for acknowledgement 
Write Not Valid 
Coherent Read Not valid 
Coherent Read and Invalidate 
Not Valid 
Coherent Write and Invalidate 
Not Valid 


Coherent Invalidate Not Valid 


MXCC Master on MBus 
Module asserts address, accepts acknowledgement, MIR, МУН, and data 
Read A+2 (min) for any acknowledgement 
Write A+2 (min) for any acknowledgement 
Coherent Read Asynchronous Clock: 
A+2 (min) for МЕН and/or MSH 


Cannot accept MIH after first acknowledgement (concurrent is acceptable) 
2 cycles (min) after MIH for acknowledgement and data 
Synchronous Clock: 
А+2 (min) for МЇН and/or MSH 
Cannot accept MIA after first acknowledgement (concurrent is acceptable) 
2 cycles (min) after MIH for acknowledgement and data 
Coherent Read and invalidate 
Asynchronous Clock: 
A«2 (min) for MIH and/or MSH 
Cannot accept MIH after first acknowledgement (concurrent is acceptable) 
2 cycles (min) after МІН for acknowledgement and data 
Synchronous Clock: 
A42 (min) for МІН and/or MSH 
Cannot accept MIH after first acknowledgement (concurrent is acceptable) 
2 cycles (min) after МЇН for acknowledgement and data 


Coherent Write and Invalidate 
A42 (min) for acknowledgements 
Note: Block copy and block zero issue CWI 


Coherent invalidate А+2 (min) for acknowledgement 
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MXCC as а Snooping Cache on MBus 

Module accepts address, assets acknowledgement, MIH, MSH, and sends data 

Read Not valid 

Write Not valid 

Coherent Read Asynchronous Clock: 
A46 + 1 for MIH and/or MSH 
A+14 for acknowledgement (best case) 
A+23 for acknowledgement (worst case) 
Synchronous Clock: 
A+6 for MIH and/or MSH | 
A416 for acknowledgement (best case) 
A+21 for acknowledgement (worst case) 


Coherent Read and Invalidate 
Asynchronous Clock: 
A+6 + 1 for МЇН and/or MSH 
A+14 for acknowledgement (best case) 
A+23 for acknowledgement (worst case) 
Synchronous Clock: 
A+6 for MIH and/or MSH 
A+16 for acknowledgement (best case) 
A+21 for acknowledgement (worst case) 
Coherent Write and Invalidate 
No acknowledgement sent out 
Note: Max of one CWI every forth cycle 


Coherent Invalidate No acknowledgement sent out 
Note: Max of one CI every forth cycle 


MXCC Slave on MBus 
Non-Coherent transactions. MXCC as addressed slave 
Read A49 to A+23 depending on register 
Write A«9 to A«23 depending on register 
Coherent Read Not valid 
Coherent Read and Invalidate 

Not valid 
Coherent Write and Invalidate 

Not valid 


Coherent Invalidate Not valid 


17-50 


Subject to Change Without Notice 


MBus 





VBus is a non-multiplexed, synchronous, highly pipelined bus for a variety of 
system applications. VBus is an especially efficient connection between the 
SuperSPARC processor (SSP), external cache RAMs, and the MultiCache 
Controller (МХСС) in a SuperSPARC module. 


Topic Page 
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18.1 Introduction 


VBusis anon-muttiplexed, synchronous, highly pipelined bus that can be used 
for a variety of system applications. VBus has 36 pins for a physical byte ad- 
dress and 64 pins for data. VBus is especially tailored to provide an efficient 
connection between the SSP, the MXCC, and extemal cache in a Super- 
SPARC module. 


VBus always operates synchronously with the SSP, and the processor clock 
provides all timing for the operations of VBus. A single clock generator pro- 
vides a low-skew clock to all devices on a VBus. All signals on VBus are 
sampled on the rising edge of the VBus clock. 


VBus is selected if CCMODE is high (deasserted state) when system reset 
(RESET) transitions from asserted (L) to deasserted (Н). COMODE should not 
be changed when RESET is high (deasserted state), as unpredictable opera- 
tion may result. The selection of MBus or VBus is visible to software in the 
MCNTL register as the mb bit. 


A central VBus arbiter controls access to the bus. A master must have RGHT 
(read grant) to begin a read access or WGHT (write grant) to begin a write ac- 
cess. The MXCC contains an integrated VBus arbiter. 


The following sections introduce the basic transactions on VBus. The interac- 
tion of these transactions with the cache-consistency protocol will also be 
described. 


18.1.1 Synchronous SRAM External Cache 
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In order to understand the operation of VBus, it is helpful to understand the 
function and the organization of the SSP, the MXCC, and SRAMs on the VBus. 
Generally, the SSP is a VBus master that accesses system memory and other 
resources through the MXCC. The bussed address and data lines go to the 
extemal cache SRAMs and the MXCC. Figure 18-1 shows the general 
scheme of VBus connections. 
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Figure 18-1. VBus Sub-system 
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When a cache hit occurs, the MXCC times the local Static Random Access 
Memory (SRAM) accesses by providing ready responses and write-enable 
gating. The MXCC also functions to control cache fills and assumes the re- 
sponsibility to write data into SRAMs when the SSP issues write to shared 
data. The MXCC controls the processor's access to the VBus with the grant 
lines. 


A 1MB E-cache consists of eight 128Kx8 or 128Kx9 synchronous SRAMS, as 
shown in Figure 18-1. Each of the SRAMs DQ[7:0] pins connects to one of the 
D[63:00] pins. The ninth bit of the 128Kx9 SRAMs should be connected to the 
corresponding DPAR signal. The SRAM's output enables are controlled by the 
OE signal from either the MXCC or the SSP, and their write enables are con- 
trolled by WE [7:0] from either the MXCC or the SSP. 


18-3 


Subject to Change Without Notice 





Synchronous SRAMs have registers on each input and output, as shown in 
Figure 18-2. This allows pipelined operation. An address is presented to an 
SRAM before the active clock edge, and it is registered in the input register at 
the clock edge; the result is stored in an output register at next active edge. 
New addresses can be supplied at each clock edge, and new outputs appear 
after two clock periods of delay. Writing works similarly, with address, data, out- 
put enable, and write enable being registered on the active clock edge and 
stored into the intemal array during the subsequent clock period. 


Figure 18-2. Synchronous SRAM Internal Organization 
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The timing of synchronous SRAM operation is illustrated in Figure 18-3. It 
shows three read cycles, followed by two write cycles, followed by two read 
cycles, all proceeding as quickly as possible. Notice that, when changing trom 
reading to writing, two address cycles must pass with OE low before the write 
address and data can be placed on the bus. These two cycles are needed for 
the data from the last read to appear on the data bus. Notice also that the data 
bus, when changing from writing to reading, is idle for two cycles because it 
takes two cycles before the data from the first read is available on the bus. 


Figure 18-3. Synchronous SRAM Timing 
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Synchronous SRAMs having this organization and sufficient speed to operate 
atthe VBus clock frequency may be used with the SSP and MXCC. There are 
several manufacturers of suitable components. 


The timing of the VBus when used to access the synchronous SRAMS is not 
identical to that shown in Figure 18-3, since the MXCC controls access to the 
VBus with WGRT and AGAT and controls the access to the data bus with 
WEE. The actual timing on VBus of a write following a read is shown in 
Figure 18-28. 
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18.1.2 Bus Transactions 


Read and write single transactions are the most basic bus operations. Block 
read and burst write transactions are used to improve VBus bandwidth. Most 
transactions are pipelined to provide higher performance. A swap (atomic 
load/store) operation is defined to provide locked bus transactions. 


Atransaction on VBus begins with a command cycle. In a command cycle, the 
address is supplied along with the command status lines. The data lines may 
also be driven for write commands. A command cycle is indicated by the 
CMDS signal. The slave decodes the address and command status signals 
andresponds with an acknowledgement only for write transactions or data and 
an acknowledgement for read transactions. 


Bursttransactions involve more than one doubleword (DW) and are of variable 
size. In a burst transaction, two or more DWs of data are transferred after a 
single command cycle. The burst continues for one more DW if BURST re- 
mains asserted or is complete after the next acknowledgement if BURST is 
deasserted. 


18.1.3 Cache Consistency 


When using the VBus, the processor's intemal data cache operates as a write- 
through cache. Since the cache writes through, ail modifications that the pro- 
cessor makes to cached data appear externally as VBus write transactions. 
These transactions are used to keep all caches (internal caches, external 
caches, and the caches of other processors) consistent with the new changes. 


The SuperSPARC processor's intemal caches are kept consistent by invalida- 
tion. Whenever an extemal bus master asserts the CMDS and WR signals 
(with DEMAP deasserted), any copies of data cached intemally that match the 
address on the bus will be invalidated. 


In systems with extemal caches, inclusion is normally maintained with the ex- 
temal cache. This allows the external cache controller to filter many system 
bus transactions and pass on only those that require action withinthe first-level 
caches to the SuperSPARC processor. The MXCC works in this way. 


18.1.4 Error Reporting 
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Each VBus transfer is acknowledged with a response. The response indicates 
good completion or one of several types of error. Each transaction has one or 
more responses, depending on the number of DWs transferred. 
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The transfer responses are encoded on the RADY, WADY, RETRY, and MEXC 
signals. This encoding is similar to the encoding defined for MBus error re- 
sponses. RRDY and WRDY are interpreted in the same way, but RRDY ap- 
plies to read transactions and WRDY applies to write transactions. Table 18-1 
defines the encoding of VBus acknowledgements and errors. 


Table 18—1.Bus Transfer Responses 
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The undefined (UD), bus error (BE), timeout (ТО), and uncorrectable[?] (UC) 
errors are not actually interpreted by the logic in the SSP. Each is logged in the 
appropriate bit in the fault status register (MFSR) but generates no other spe- 
аћс actions that differentiate the four errors. Thus, the names given each of 
the four errors are only suggestions, and systems may choose other inter- 
pretations of the four codes. 


Throughout the bus cycle examples in this chapter, a standard error response 
is used when the example shows an error response. The error shown has 
MEXO and either (or both) RADY or WARDY asserted, which is the encoding 
for an uncorrectable error. Other than retry responses, the error replies are all 
treated in exactly the same manner. The particular error type is decoded and 
used to set the appropriate bits in the MFSR. 


18.1.5 Memory and /О Transactions 


Memory transactions are generally cacheable; I/O transactions generally are 
not. Cacheable read operations, with the exception of Memory Management 
Unit (MMU) table walk operations, use read/block transactions. Non-Cache- 
able transactions always use single-cycie transfers. The ССНВГ signal is as- 
serted during all cacheable transactions. 
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When the SSP needs to perform an extemal bus access, it must have access 
to the system bus. Access is granted to the processor by asserting one or both 
of the bus grant signals RGAT and WGHT. In systems with extemal caches, 
these signals will normally be controlled independently. In simpler Dynamic 
Random Access Memory (DRAM)-based systems, a single composite 
BUSGHT signal may be used by connecting WGRT and КОНТ together. 


The SSP supports lazy arbitration; if the bus grant is already asserted when 
the SSP wants to begin an access, the access will begin immediately. If a grant 
is not present, the BUSHEQ signal will be asserted until the bus is granted. 
Once the bus is granted, the SSP may issue an access, as described in the 
following sections. The bus grant must remain active for the duration of the bus 
cycle. If the grant signal is deasserted, the SSP disables its drivers on all 
shared VBus signals on the following cycle. 





Note: 


The SuperSPARC processor begins a transaction on VBus immediately if 
the bus grant signal needed is already active. Since it is possible for Super- 
SPARC to begin a bus cycle at the same time external arbitration is removing 
these bus grants, system logic must monitor the CMDS signal to ensure that 
atransaction had not started as the bus grant was removed. Since extemal 
arbiters should ensure that an empty cycle exists between bus owners, this 
cycle may be used to detect the arbitration collision. 


Oncethe bus grant has been deasserted, the SSP will disable its ИО signals 
on the next cycle. This could interfere with the transaction that was started, 
although some part of the transaction may have completed. When this situa- 
tion occurs, you should retry the SSP's transaction. When the memory or ex- 
ternal cache controller detects the arbitration conflict (grant is deasserted 
and CMDS is asserted), it should immediately assert the RETRY signal. This 
will force SuperSPARC to cancel the pending transaction and re-arbitrate for 
use of the bus. See Figure 18-9. 





The SSP also samples the OE pin to prevent collisions with retuming read data 
from reads generated by the external cache controller. Although it is generally 
legal for the SSP to overlap write cycles with outstanding read miss requests, 
this can cause a data drive conflict when the extemal cache controller is read- 
ing from the SRAM and the SSP tries to begin a store transaction. The SSP 
does not initiate driving the data bus if the OE signal was asserted externally 
on the previous cycle. This case should generally be covered by arbitration but 
is included for extra protection. 
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18.1.7 MMU Consistency 


Demap transactions are used by the SSP to flush one or more pages from its 
MMU's translation lookaside buffer (TLB). Internally, these are generated by 
reference MMU flush operations (see Section 8.8). While processor demap 
operations do not perform bus transactions in MBus systems, in a VBus config- 
uration internal demap operations also generate VBus demap transactions. 
The purpose of a VBus demap transaction is to communicate demap opera- 
tions from the SSP to other TLBs in the system or to communicate demap op- 
erations originating elsewhere in the system to the local SSP. 
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18.2 VBus Signals 
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The SSP's interface to the system when VBus is selected is 143 signals plus 
the processor (and VBus) clock VCLK. These signals use CMOS levels. All 
signals are fully synchronous and are sampled on the rising edge of the VBus 


clock, VCLK. 


Most VBus bidirectional signals have bus keepers which are low power amplifi- 
ers that keep the voltage level signal pins from drifting from valid high or low 
values when all drivers are in a high-impedance state. 


ADDR[35:0] 


DATA[63:0] 


BURST 


Address. These 36 signals provide the physical byte 
addresses for all VBus transactions. They are 
sampled when CMDS is asserted and on successive 
cycles if BURST is asserted. These signals are I/Os 
on both the SSP and the MXCC. 


Data. These are the 64 data bits for VBus transac- 
tions. Bytewise parity is computed and appears on 
DPAR[7:0]. These signals are I/Os on both the SSP 
and the MXCC. 


This signal is an input to ће SSP. When used with the 
MXCC, ARDY should be tied to low, as the MXCC is 
always ready to accept addresses or bus cycles. The 
MXCC has no ARDY pin. It is recommended that this 
signal always be tied low. 


This signal is an output from the SSP and an input to 
the MXCC. It indicates that the current address on 
the bus is part of a burst bus cycle. BURST is 
asserted the same time as ADDR[35:0]; it is asserted 
through both read and write bursts. BURST is deas- 
serted on the last address of a burst to allow the 
MXCC to stop retuming RADY or WRDY with the last 
data of the burst. 


This signal is an output from the SSP and an input to 
the MXCC. It indicates that the current transaction is 
cacheable in an extemal cache. This signal is 
sampled by MXCC only when CMDS is asserted by 
the SSP. 
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CMDS Command Strobe. The VBus master asserts this sig- 
nal for one cycle to begin all transactions. If the SSP 
isnotthe bus master, the MXCC asserts CMDS as an 
input to the SSP to initiate external snoop transac- 
tions (including both demaps and invalidates). With 
the MXCC as VBus master, CMDS high indicates 
that no command word is present on the VBus. 
CMDS low indicates an MXCC-initiated VBus invali- 
date or demap command word on ADDR[35:0], DE- 
MAP, and WR. 


Whenthe SSP is the bus master, it asserts this signal 
for the first cycle of a VBus transaction. In this case, 
CMDS high indicates that no command word is pres- 
ent on the VBus. CMDS low indicates a Super- 
SPARC initiated VBus command word on 
ADDR[35:0, ССНВГ, CSA, DEMAP, 0057, 
SIZE[1:0], SU, RD, and WR. 


CSA Contro! Space Access. the SSP asserts this signal 
when performing a read or write to the MXCC control 
space through AS! 0x02. The E-cache tag RAM, 
E-cache data, and intemal MXCC registers are ac- 
cessible through ASI 0x02. When CSA is high, a nor- 
mal memory access is indicated, whereas when TSA 
is low a control space access is indicated. 


DEMAP Demap an address translation. DEMAP is asserted 
along with CMDS to indicate a demap cycle. This 
cycle can be initiated or by the system (as communi- 
cated via the MXCC) or the local SSP. The 
DATA[63:0] lines contain a demap data word; the de- 
map data word indicates which virtual address 
translations are to be discarded. 


When БЕМАР is input to the SSP (output from the 
MXCC), any entries in the SSP’s TLB matching the 
request will be removed. 
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When the SSP initiates a DEMAP transaction, 
translation hardware in the system should remove 
TLB entries matching the request. When DEMAP is 
input into the MXCC and WR asserted, a demap re- 
questis passed from SuperSPARC tothe MXCC and 
out to the system bus. A DEMAP assertion input into 
the MXCC, with РО asserted, indicates that Super- 
SPARC has successfully completed a demap opera- 
tion initiated by the system bus (through the MXCC). 


DEMAP is sampled only when CMDS is asserted. 


DPAR[7:0] Data bus parity. When parity is enabled, even parity 
is generated and checked. The correspondence of 
DPAR bits to the DATA bits checked are shown in 
Table 18-2. 


Table 18-2. DATA Bits Checked by DPAR Bits 










DPA 


ERROR Processor Error. This signal is asserted by the SSP 
to indicate that the processor has entered error mode 
and will take a watchdog reset trap. The MXCC initi- 
ates an internal error when ERROR is asserted. See 
Section 12.3. 


IRL[3:0] interrupt Request Level. This field specifies the level 
ofthe highest priority interrupt request that is current- 
ly pending. itis an inputto the SSP. If IRL [3:0] = 0000, 
no interrupts are pending. Level 15 (IRL[3:0] = 1111) 
indicates a non-maskable interrupt (NMI) that cannot 
be masked by PSR.PIL; level 14 is the highest priority 
maskabie interrupt; level 1 is the lowest priority 
maskable interrupt. 
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Table 18-3, VBus Acknowledgements 





VBus Signals 


This signal indicates that an atomic Load/Store op- 
eration (LDSTUB, LDSTUBA, SWAP, or SWAPA) 
has been initiated by the SSP. LOST is the equivalent 
of a logical OR of the RD and WR signals. 


Memory Exception. This signal is output from the 
MXCC to the SSP and is asserted when the MXCC 
cannot retum or accept the requested data. This sig- 
nal may cause the SSP to take an exception trap. 
МЕХС is encoded, along with RADY or WADY, and 
RETRY to indicate the type of acknowledgement for 
a transaction initiated by the SSP (See Table 18-3). 





Data Transfer Complete 
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SRAM Output Enable. As an output from either the 
SuperSPARC processor or the МХСС, this signal 
controls the pipelined output enable of the extemal 
cache SRAMs. OE is used as an input by the SSP to 
prevent bus collisions. 


Pending. This signal is output from the MXCC to in- 
form the SSP that there is at least one outstanding 
write operation that has not completed. PEND is as- 
serted by the MXCC when it has a store operation 
pending internally or on the system bus. PEND is 
used by the SSP to support the PSO memory model 
(see Section 7.7). 
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SIZE[1:0] 


Table 18—4. Size Encodings 
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Read. The RD signal is driven to qualify addresses 
on the VBus as read cycles. It is also asserted by the 
SSP, along with WFi and LDST, to indicate an atomic 
Load/Store cycle. The SSP uses HD as an input for 
factory SRAM test only. RD is sampled only when 
CMDS is asserted. 


Reset. This input forces the SSP to perform hard- 
ware reset. See Chapter 12. 


Retry. This signal is encoded, along with RRDY/WR- 
DY and МЕХС, to indicate the type of acknowledge- 
ment on VBus. See Table 18-3. if the MXCC asserts 
this signal before НАБУ or WARDY is asserted for an 
access, the processor terminates the current access 
and restarts it once it reacquires the VBus (if a read 
is pending, a write will not be retried until after the 
read has completed). 


Read Grant. This signal grants read access on VBus 
to the SSP. 


Read Ready. This signal indicates to the SSP that 
read data is valid. When RRDY is asserted, the SSP 
may reliably sample the incoming data on the same 
clock edge as НАБУ. This signal is used to qualify 
data specifically for a read access, since a write may 
aiso be pending. This signal is encoded with MEXC 
and RETRY to provide the MXCC transaction ac- 
knowledgements, as in Table 18-3. 


These bits indicate to the MXCC the size of the cur- 
rent VBus transaction initiated by the SSP. They are 
outputs only. This field is only sampled when CMDS 
is asserted. The encoding of the SIZE field is shown 
in Table 18-4. 
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Supervisor Access. This signal is asserted by Super- 
SPARC with CMDS when the access was initiated 
with PSR.S=1 (in supervisor mode). 


SRAM Write Enables. These signals directly control 
the write-enable signals of the synchronous SRAM 
used for the extemal cache. These signals are only 
driven when asserted; otherwise, they are in a high- 
impedance state. WE bit ordering conforms to the 
big-endian convention (WEO is the write enable for 
byte 0 – DATA[63:56]) 


E-cache write enable enable. The SSP cannot place 
write data onto ОАТА[63:0] and DPAR[7:0] or drive 
the WE[7:0] pins until permitted by the MXCC using 
WEE. During a write transaction on VBus, the SSP 
will wait for WEE to be asserted before using those 
signals on VBus. Once WEE has been asserted, the 
SSP has permission to use VBus for writing for the 
duration of the burst if needed. See Subsection 
18.3.7 for figures illustrating the use of WEE. 


Write Grant. This signal is output by the MXCC and 
allows the SSP to begin a write access on the VBus. 


Write. The SSP drives the WR signal to quality an ad- 
dress on VBus as a write cycle. When driven by the 
SSP, along with DEMAP, a demap request is sent to 
the system bus. When WR is driven by the SSP, 
along with HD and LDST, an atomic Load/Store 
transaction is sent. The MXCC will output WA, with 
an address on ADDR[35:0], to invalidate lines in Su- 
perSPARO's internal caches that contain that ad- 
dress. 


Write Ready. This signal notifies the SSP that the 
MXCC has sampled the SSP's write data, and the 
processor may generate the next access. In the case 
of burst writes, the processor will switch address and 
data for the next write within the burst on the same 
clock edge on which WRDY was asserted. This sig- 
nal is used to qualify data specifically for a write ac- 
cess, since a read may also be pending. This signal 
is encoded with МЕХО and RETRY to provide the 
MXCC transaction acknowledgements, as in 
Table 18-3. 
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Table 18-5 summarizes the VBus signals and which devices use the signal as 
an input, output, or I/O. Entries labelled O/Z are outputs that are high-imped- 


ance except when asserting the signal or during the 1/2 cycle restoration of the 
deasserted level on the bus. 


M Table 18—5. Summary of VBus Signals 
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18.3 VBus Transactions and Waveforms 


In order to understand the following waveforms, it is helpful to understand the 
function and the organization of the SSP, the MXCC, and SRAMs on the VBus. 
Generally, the SSP is a VBus master that accesses system memory and other 
resources through the MXCC. The bussed address and data lines go to the 
external cache SRAMs and the MXCC. 


When a cache hit occurs, the MXCC functions to time the local SRAM ac- 
cesses by providing ready responses and write-enable gating (WEE). The 
MXCC also functions to control cache fills and assumes the responsibility to 
write data into SRAMS when the SSP issues write to shared data. The MXCC 
controls the processor's access to the VBus with the grant lines FIGHT and 
WGRT. The MXCC will deassert the grants when it needs access to the 
SRAMS (either read or write), when it needs to issue system DEMAP requests 
or system DEMAP replies, or when it needs to invalidate entries in the SSP's 
intemal caches. The MXCC VBus cycles are not normal master/slave bus 
cycles because there is no acknowledgement to the MXCC from any slave de- 
vice on the VBus. 


A 1 Mbyte E-cache consists of eight 128Kx8 or 128Kx9 synchronous SRAMs. 
Each of the SRAM's DQ[7:0] pins connects to one of the D[63:00] pins. The 
ninth bit of the 128Kx9 SRAMs should be connected to the corresponding 
DPAR[7:0] signal. The SRAMs' output enables are controlled by the OE signal 
from either the MXCC or the SuperSPARC processor; their write enables are 
controlled by WE [7:0] from either the MXCC or the SSP. 


18.3.1 VBus Waveforms 


All signals on the VBus are sampled on the rising edge of the internal VBus 
dock signal (VCLK on the SSP and PCLK on the MXCC). When PLLBYP is 
asserted, the intemal clock is a delayed (by about 4.5 ns) version of the VBus 
dock as presented to the VCLK or PCLK pin. When PLLBYP is deasserted, 
the internal clock is generated by the phase-locked loop to be coincident with 


the rising edge of the PCLK pin when PLLBYP is not asserted. See Chapter 
21. 


The MXCC controls access to VBus through the RGRT and WGRT lines. Su- 
perSPARC may access VBus for a read when КАНТ is asserted and may ac- 
cess the bus for a write when WGRT is asserted. 


18-17 


Subject to Change Without Notice 





18-18 
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VBus transactions begin when CMDS is asserted and ADDR[35:0] and 
SIZE[1:0] are driven. Other signals (AD, WR, CCHBL, BURST, LDST, CSA, 
DPAR[0:7], and DATA[63:00]) are also driven as required for this type of cycle. 
Since bus keepers maintain undriven signals at a valid level (either high or 
low), the SSP will only drive signals that need to be asserted. Shared signals 
that are pulsed (for example, CMDS) are driven to the deasserted level for one 
half cycle before being released to undriven. During writes, ADDR[35:0], 
SIZE[1:0], D[63:0], RD, WR, CCHBL, and DPAR[7:0] are held valid until ac- 
knowledged by WRDY. 


In the following waveforms, the ADDR, DATA, and WE buses are shown as 
single signals to simplify the diagrams. 
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18.3.2 Cacheable Single Read Hit 


Figure 18-4 shows a read by the SSP of a single cacheable word with an 
E-cache hit. The processor asserts the address, cycle qualifiers, and the OE 
to SRAM. The MXCC detects an E-cache tag match and issues a RADY atthe 
same time that the SRAMs drive data to SuperSPARC. The OE from the SSP 
is delayed in the registers internal to the synchronous SRAMs, and the data 
is enabled two cycles after the OE is issued to the SRAMs. Note that the par- 
tially bussed VBus control signals are actively deasserted for 1/2 cycle before 
being released to the bus keepers. 


Figure 18-4, VBus-Cacheable Single Read Hit 
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18.3.3 Cacheable Single Read Miss 
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Figure 18-5 shows a cacheable single-read miss. The MXCC detects that a 
tag mismatch occurs and issues a cycle to the system bus to obtain data to fill 
the E-cache. Itremoves RGHT to allow SuperSPARC to proceed with any write 
operation it may have had pending. When the system bus returns the re- 
quested data block, the MXCC removes bus grant to SuperSPARC (negates 
WRGT) to obtain access to the SRAMs. The MXCC writes the data into the 
SRAMs. The MXCC issues a ARDY to SuperSPARC, as the data word re- 
quested (by SuperSPARC read) is driven on the DATA lines (while the data is 
being written into SRAMs). у 
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Figure 18-5. VBus-Cacheable Single Read Miss 
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18.3.4 Burst Read Hit 
Figure 18-6 shows a burst read hit. As with a cacheable single read hit, the 
MXCC functions mainly to time the cycle by asserting RADY as the SRAM pro- 
vides the data. 


Figure 18-6. VBus Burst Read Hit 
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Overlapped Burst Read Hit 


Figure 18-7 shows a second burst read cycle overlapping the first. In this ex- 
ample SuperSPARC issues a CMDS and next burst address as soon as the 
lastof the previous addresses was sent. The earliestthe overlapping cycle can 
occur is one cycle after the first ready (WADY or RADY) has been received 
by SuperSPARC. In Figure 18-7 the overlap is regulated by the availability of 
the address lines. 


Figure 18-7. VBus-Overlapped Burst Read Hit 
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18.3.5 Burst Read Miss 
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Figure 18-8 shows a burst read miss. The MXCC removes НАСТ to indicate 
that the cycle is in progress and that SuperSPARC can proceed with an out- 
standing write if one is pending. When the data retums from the system bus, 
the MXCC writes itinto the SRAM and asserts RRDY when the requested data 
is on the VBus. Note that, in Figure 18-8, the MXCC is in XBus configuration, 
and consequently the block size is 64 bytes. Only 32 bytes are sent to Super- 
SPARC, while all 64 bytes are stored in SRAM. Also note that with critical word 
first ordering that the data returned starts from the index into the block for the 
requested doubleword, continues to the last index, and then wraps from index 
0 to the starting index minus 1. 
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Figure 18-8. VBus Burst Read Miss 
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P Read is critical word first ordering. This example has word two of the block returned first. 
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Figure 18-9 shows a burst read miss that detects a VBus operation at the 
same time it is removing write grant. The MXCC must issue a retry and wait 
an extra clockto allow the SSP to get off the bus. Figure 18-9 depicts an MBus 
configuration (i.e., 32 bytes of data are retumed from the system bus). The in- 
terruption in the transaction can occur due to the difference in clock frequen- 
cies between the VBus and MBus. 


Figure 18-9. VBus Read Miss (With interference from the SuperSPARC processor) 
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18.3.6 Cache Disable Read 


Figure 18-10 shows a single read with the cache disabled. The MXCC goes 
to the system bus to accomplish this operation. ItdeassertsRGRT to allow Su- 
perSPARC to complete pending write operations. When the data is available, 
the MXCC negates grant, drives the data, and asserts RADY. 


Figure 18-10. VBus Cache Disable Read 
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18.3.7 Cacheable Write Hit 


Figure 18-11 shows a cacheable single-write hit. The MXCC asserts WEE at 
the CMD+2 cycle (i.e., two cycles after CMDS) to allow the assertion of the 
write data (DATA, DPAR) and strobes (WE[0:7]). The MXCC asserts the 
WARDY in the following cycle (CMDS + 3). 


Figure 18-11. VBus-Cacheable Single Write Hit 


18-28 


+ 
[1 
[1 
e Е. 
Е 
= | 
LI 
* 
О 
РЕМ | 
О 
, 
з 
at 
' 
LI 
b 
1 
4 
, 
з 
, 
~ 





VBus 


Subject to Change Without Notice 


VBus Transactions and Waveforms 





Figure 18-12 shows a burst hit. It is basically the same except that WRDY is 
asserted for each data doubleword written in the burst. The SSP deasserts 
BURST one cycle before the last write. Each of the individual writes in the burst 
fromthe SSP may be from one to eight bytes and may be at any address within 
the cache block. The number of consecutive writes may be of arbitrary length. 
If the MXCC needs the VBus while a burst write cycle is occurring, it can deas- 
sert the WRGT signal to terminate the burst cycle prematurely, as shown in 
Figure 18-13. When the SSP reacquires the VBus, it continues the burst write 
from where it was interrupted. 
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Figure 18-12. VBus-Cacheable Write Hit 
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Figure 18-13. VBus-Cacheable Burst Write Hit Abort 
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18.3.8 Shared Write 


Figure 18-14 shows a burst write to shared data. The MXCC does not assert 
WEE, thus preventing the SSP from writing to SRAM. This signal sequence 
also interrupts the burst cycle. The SSP considers the cycle over; the MXCC 
assumes actual responsibility to update SRAM (with the data provided by Su- 
perSPARC at CMDS) after it has completed a shared write command to the 
system bus. їп MBus configurations, the shared write on the system bus is ac- 
tually a Coherent Invalidate (CI) that has the effect of making the MXCC cache 
block exclusive (subsequent cycles to the same block won't be shared). In 
XBus configurations, a shared write is serit on the system bus, and it is possi- 
ble that each of the individual addresses in the VBus burst write causes a 
shared write cycle (i.e., the cache block may not become exclusive). 
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Figure 18-14. VBus Shared Write 
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Figure 18—14. VBus Shared Write (Continued) 





О ' О ' ' ' ‘ О . ‘ L LI LI LI LI ^ LI О 
appa ^^ A0 D ' ' ' у а до» + + = = = A1 , Р 


Т т 








18-34 VBus 


Subject to Change — Notice 


VBus Transactions and Waveforms 


18.3.9 Cache-Disable Write or Non-Cacheable write 
Figure 18-15 shows a cache-disable write. The MXCC terminates the VBus 
cycle by issuing a WRDY without asserting WEE. A non-cacheable write would 
be identical. 


Figure 18-15. УВиз Cache-Disable (or Non-Cacheable) Write 
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Figure 18-16 shows a burst write cycle when the cache is disabled. The 
MXCC terminates the cycle and interrupts the burst operation by terminating 
with a WRDY without asserting WEE. 


Figure 18-16. VBus Cache-Disable Burst Write 





Subject to Change — Notice 


VBus Transactions and Waveforms 





18.3.10 Invalidate 


Figure 18-17 shows an invalidate. The МХСС first removes the SSP from the 
VBus by revoking the HGRT and WGRT bus grants; it then asserts the ad- 
dress, WR, and CMOS. Multiple Invalidates may occur consecutively. invali- 
dates may also occur when the MXCC has obtained the VBus for SRAM reads 
or writes. 


Figure 18-17. УВиз Invalidation 
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18.311 Demap Transactions 


Demap transactions are used to remove virtual address translations from the 
MMUS of processors on VBus and to communicate the removal of translations 
from the local processor to the rest of the system. MBus does not support de- 
map operations on the system bus—locally initiated demap operations on 
VBus are ignored by the MXCC and the МХСС never generates demap opera- 
tions on VBus. 


The information transmitted for a demap transaction includes virtual address 
and type, which the MMU uses as criteria to match pages in the TLB for remov- 
al. The context to be used for the demap is broadcast in bits 47 through 32. 
Thelower 32 bits are equivalentto the data format ofthe demap operation (see 
Subsection 8.8.2). The exact format of the DATA[63:0] signals is shown in 
Figure 18-18. Bit 11, bits 0 through 7, and bits 48 through 63 are reserved. 


Figure 18—18. VBus Demap Data Format 
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48 47 32 31 1211 10 8 7 0 


reserved Unused. Sourced by the SSP as zero. 


CONTEXT Contextto demap. The context field is compared with 
the context portion of the TLB entry tag in all partici- 
pating MMUs. The match is used according to the de- 
map type. 


VDA Virtual demap address. All or part of the VDA field is 
compared to the virtual address portion of the TLB 
entry tag in all participating MMUs. The match is used 
according to the demap type. 


TYPE Demap type. The demap type is encoded the same 
as MMU demap requests and shown in Table 18-6. 


г, reserved1 Unused. Sourced by the SSP as zero. 


The TYPE field controls the operation of the demap. Any TLB entries in all par- 
ticipating MMUS that match the hit criteria for the particular demap type should 
be invalidated. Table 18-6 shows the demap hit criteria for bus demaps. A de- 
maprequest must meet the access, virtual address, and level critería of a TLB 
entry for the type of the demap request for the entry to be a demap hit. Multiple 
entries in a cache may hit within a single cache for demap types other than 
page. 
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Table 18—6.Demap Hit Criteria for Bus Demap Operations 


Type | Must meet all three criteria 
[к= [ ина дбае | ime | 


Page - 4K-byte CONTEXT = context or Tag.Addr = VDA in level = 3 
АСС = 6 ог АСС = 7 [81:12] 
Segment - 256K-byte | CONTEXT = context or Tag.Addr = VDA in level = 3 or 2 
| АСС=богАСС=7 [31:18] 
Region - 16M-byte || CONTEXT = context or Tag.Addr = VDA in level « 3 or 2 or 1 
| АСС=богАСС=7 [31:24] 
Context - 4G-byte CONTEXT « context and always hit 
ACC «6 


о [тшм 


In addition to broadcasting this demap transaction to the rest of the system, 
the SSP can receive demap transactions originating elsewhere in the system. 
When ademapis received, the processor executes the demap as if ithad been 
generated internally, but, instead of the current internal context, uses the pro- 
vided context. The SSP requires a single ready reply for the demap operation. 
itis the system hardware’s responsibility to ensure that the demap is broadcast 
to and completed by all MMUs іп the system. Incoming demaps use a two- 
phase request/reply protocol. 


peee————HÓ—————— —— —————————— 
Note: 


tf broadcast demaps are used, only a single Петар transaction may be 
pending in the system at any one time. System software is responsible for 
maintaining this. Indeterminate operation may result if multiple demaps are 
in progress at the same time by any processor. See Subsection 7.5.3. 
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Figure 18-19 shows a demap operation on VBus initiated from the MXCC in 


response to an XBus demap request packet. The MXCC never issues demap 
operations on VBus if it is in the MBus configuration. 


Figure 18-19. VBus MXCC-Initiated Петар 
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The first phase is the extemal Demap Request. The MXCC obtains the bus 
and then asserts CMDS and DEMAP and drives demap data according to 
Figure 18-18 onto the data bus. The MXCC supplies an address of zero on 
the ADDR lines. 


The second phase is the reply to that request. The SSP asserts CMDS, RD, 
and DEMAP. The МХСС responds by asserting RRDY. There may be other 
bus activity between the request and reply. 
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Figure 18-20 shows a demap initiated by the local SSP. To initiate the demap, 
the SSP asserts WR and DEMAP with CMDS and the demap data inthe format 
shownin Figure 18-18. The ADDR signals are all zero during this cycle. There 
are two replies by the MXCC to this request. The first reply (WRD Y) acknowl- 
edges receipt of the demap request. The second reply informs the processor 
thatthe demaphas successfully completed across the system and is signalled 
with the RADY signal. 


Figure 18-20. VBus SuperSPARC-Initiated Demap 
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The MXCC may reply to a demap requests with responses other than WRDY. 
If the reply is RETRY, the demap operation will be retried on VBus until it 
succeeds or fails. If the reply is one ofthe error replies, the error will be reported 
to the STA instruction that generated the DEMAP just as it would for any other 
memory access instruction. 
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18.3.12 LDST (Load and Store) 


Figure 18-21 shows a load and store instruction to an exclusively owned 
block. The SSP provides the data to be written beginning at command strobe. 
The MXCC saves this data, asserts the WADY to the SSP, and asserts the OE 
to the SRAM to read the current data. Two cycles after the OE, the MXCC as- 
serts HADY as the SRAM provides the current data. The MXCC then com- 
pletes the write operation to the SRAM. 


Figure 18-21. VBus LDST Exclusive Hit 


CLK 
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Figure 18-22 shows the same operation when the data is shared. The MXCC 
asserts WRDY to signal that the write data has been saved. The MXCC issues 
asystemcycie (normally Cl onthe MBus and shared write on the XBus). When 
the system cycle is over, the MXCC asserts OE, then asserts the ARDY two 
clocks later as data is provided by SRAMs on the Data bus. The MXCC then 
writes the write data into the SRAMs. 


Figure 18-22. VBus LDST Shared 
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Figure 18-23 shows a non-cacheable load and store operation. In MBus con- 
figurations, a locked read/write sequence is executed. In XBus configurations, 
a swap single request packet is sent on the XBus. 


Figure 18-23. VBus Non-Cacheable LDST 
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18.3.13 VBus SRAM and Register Reads 


Figure 18-24 shows the SSP reading the SRAM via control space. 
Figure 18-25 shows the SSP reading the internal the MXCC registers in con- 
trol space. Figure 18-26 is a write cycle to ће МХСС registers. 


Figure 18-24. VBus External Cache Read 
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Figure 18-25. VBus Register Read 
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Figure 18-26. УВиз Register Write 
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18.3.14 MXCC SRAM Access 


Figure 18-27 shows the MXCC removing grants to SuperSPARC and reading 
the SRAMs. The cycle shown is a Coherent Read and Invalidate (CRI) on the 
VBus. This CRI causes the MXCC to issue an invalidate to SuperSPARC as 
the МХСС reads the SRAMs. 


Figure 18-27. VBus MXCC SRAM Read 
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18.3.15 Data Bus Contention Avoidance 


When there is a read on VBus, followed immediately by a write, the pipelined 
SRAMs may still be processing the read when the write appears on the ad- 
dress and command lines. If the data lines are also driven for the write in the 
same cycle, the result will be a driver clash, with drivers in the SRAM trying to 
drive the SRAM read data onto the VBus at the same time that the SSP is trying 
to drive the write data onto the VBus. 
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To prevent this driver clash, the SSP monitors the OE signal and does notdrive 
the VBus DATA[63:0] and DPAR[0:7] signals until at least the third cycle after 
OE has been sampled as asserted. Figure 18-28 illustrates the action of OE. 


Figure 18-28. Action of OE in Preventing Data Bus Driver Clash 





-—- wwe 








SRAMS driving DATA SSP driving DATA 


18-50 VBus 


Subject to Change Without Notice 


Chapter 19 


XBus 













XBus is an extension bus that allows МХСС to be connected to one or more 
system bus interfaces, called bus watchers. XBus uses an advanced, 
synchronous, packet-switched protocol to provide low latency and high 
bandwidth. XBus consists of 82 bussed signals, along with three point-to-point 
arbitration signals per bus watcher. 


Topic Page 
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19.1 XBus Overview 


XBus (Figure 19-1) is designed as a clean dividing line between the functions 
of the extemal cache controller that are implemented in the MultiCache Con- 
troller (МХСС) and the functions of a multiprocessor bus interface that are im- 
plemented in a bus watcher (BW). XBus serves to communicate data requests 
from the processor and tag state changes from the cache controller to one or 
more BW's and snoop requests and tag state changes from the BW's to the 
cache controller. 


In this environment, the needs of performance dictate that the cache tags be 
duplicated so that bus snooping and processor accesses can each be per- 
formed with low latency. XBus provides means for keeping duplicate tags in 
the cache controlier and the BW. The BW can initiate any change in cache tag 
state, allowing proper operation with almost any multiprocessor cache-coher- 
ency scheme as dictated by the needs of the backplane bus. 


XBus is adaptable to many multiprocessor backplane buses when used with 
an appropriate BW. The BW is envisioned to be a customer-designed semi- 
custom device (ASIC) that adapts from the XBus protocol to the particular 
backplane bus of the customer's system. BWs might adapt XBus to Future- 
bus+ or other modern high-performance buses. 


Figure 19-1. XBus System 
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19.2 XBus Features 
XBus supports high-performance computing through a number of features: 


(С Packet-switched bus for high bandwidth and multiple outstanding opera- 
tions. 


[] Division of function between cache controller functions in MXCC and bus 
interface functions in BWs. 


С] Flexible support for cache-coherent multiprocessing. 


С] Low-power, high-speed gunning transceiver logic (СТІ) electrical inter- 
face. 


{3 Support for up to four BWs per processor. 


O Synchronous or asynchronous operation with processor clock using 
MXCC’s asynchronous interface mode. 


(С Separate 36-bit memory address space and 36-bit I/O address space. 
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19.2.1 Packet-Switched Bus 


The XBus is a packet-switched (split-cycle) bus. Unlike circuit-switched 
busses, packet-switched busses allow the same wires to have a higher 
bandwidth than a similar circuit-switched bus. This is possible because, on a 
packet-switched bus, the bus wires are free for use between the initial request 
of an operation and the reply. For a memory read access, this free time on the 
bus corresponds to the access latency of the memory, a time that can take 
many cycles on a high-speed bus. 


In a circuit-switched bus a bus master (e.g., a processor) that needs to use a 
slave (such as memory) arbitrates for the bus and obtains ownership. It 
supplies a slave address and waits for a response. The slave either accepts 
or supplies data and signals the master when it is done. The master then 
releases the bus. 


In a packet-switched bus a processor that needs to use memory also acquires 
the bus. But on a packet-switched bus, the processor keeps the bus just long 
enough to send a request message consisting of a target I.D., a command, its 
own retum 1.0., and data (if appropriate). After sending the message, the 
processor releases the bus. Each resource on the bus (such as memory 
subsystem) monitors messages looking for messages targeted to it. When a 
bus resource detects such a message, it attempts to obey the command. 
When the resource completes the requested operation, it signals completion 
by acquiring the bus with the same arbitration mechanism and then sending 
a completion message back to the originator of the command. The completion 
message contains the ID of the slave, the requestor's ID, status information, 
and data (as appropriate). 


Broadcast messages (intended for multiple destinations) are also possible 
with packet-switched busses. Each addressed resource can retum a 
completion message in tum by arbitrating for the bus. 


19.2.2 Division of Functions 


19-4 


XBus serves to separate cache functions as implemented in the MXCC from 
bus interface functions as provided by the BW. By dividing these functions, the 
MXCC can be used with a wide variety of system interconnects, both bussed 
and non-bussed. 


о MXCC 
The МХСС is responsible for the following functions: 
Ш E-cache tags and control, 
Ш E-cache miss handling, 
Ш Prefetch handling, 
Ш XBus arbitration, 
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Ш Block memory copy/zero, 
М Interrupt handling, 


B Synchronization for operating the bus at a clock rate different from the 
processor’s, and 


Ш BootBus. 
J Bw 
A BW is responsible for the following functions: 


Ш interfacing tothe system interconnect (includes protocol, data format- 
ting, access control, electrical protocol, etc.), 


W Cache coherence protocol (includes snooping of the system bus or 
participation in directory-based protocols), 

Ш Coordinating system-wide completion of demap requests, 

W Containing system resources that are duplicated per processor, and 

Ш Snooping its own operations. 


19.2.3 System Configuration 


XBus offers flexible cache-coherent multiprocessing. The division of functions 
allows flexibility in configuring systems. Different BW designs may be devised 
to match a wide variety of system organizations and a wide variety of system 
interconnects. The cache-coherence protocol is controlled by the BW and 
XBus. The MXCC can be used with any protocol that can be mapped to the 
cache block states of the E-cache tags in the MXCC. 


The duplicate sets of cache tags in the MXCC and in the BWs must be kept 
consistent. In order to prevent momentary inconsistencies from resulting in a 
loss of system consistency, the BW tags are always updated first. The BW 
updates its snoop tags for each operation it receives from the MXCC or from 
the bus. When the snoop tags change, the E-cache tags are updated using the 
TCmd field of a request or reply packet to the MXCC. 


19.2.4 Electrical Interface 


XBus operates with GTL electrical interface. GTL is a low-voltage swing 
signaling scheme for high frequencies, low power, and low cost. When using 
XBus and GTL levels, the GTL reference voltage (GTL-REF) must be 
supplied to the MXCC and the BWs. 

19.2.5 Bus interfaces 


MXCC has direct support for one, two, or four BWs. This allows several system 
buses to be used to provide more bandwidth for system interconnect. 
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The built-in XBus arbiter in the MXCC provides arbitration for one, two, or four 
BWs. 


19.2.6 Dual Clocks 


The MXCC’s support for synchronous or asynchronous operation of the 
processor and the bus may be utilized in XBus systems. When asynchronous 
operation is selected, the processor clock may be faster than the XBus clock. 


19.2.7 Address Spaces | 
XBus has two sets of commands that access two separate address spaces: 
[] Memory Address Space 
This is a byte-addressed 36-bit, cache-coherent address space. 
С] VO Address Space 
This address space is also 36-bits. It is not cached by the MXCC. 


19.2.8 Bus Watchers 


BWs interface XBus to application-specific system buses or devices. Their 
function is to translate: 


О XBus transactions into system bus or device operations, or 


LJ) System bus or device operations into XBus transactions. 
At the lowest operational level, BWs: 


Г] Receive XBus packets that request system bus resources or system bus 
actions and map them to the appropriate system bus commands. 


О Receivesystem bus responses or replies to these requests and map them 
to XBus reply messages. 


С) Receive system bus commands directed to the XBus and convert them to 
appropriate XBus command packets. 


(Д Receive XBus replies and map them to corresponding system bus re- 
sponses. 


[J Snoop system bus operations for references to locally cached data and 
send messages to the MXCC to perform coherency operations. 
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19.3 XBus Signals 


XBus signals are divided into the following groups: 


[] Control, 


О Arbitration, and 


С Data 


Except for the clock and data signals, all signals are encoded low true and are 
written with an overbar to indicate negated logical values. 


19.3.1 Control Signals 


The control group contains the bus clock and an error signal. 


XCLK 


CCErr 


19.3.2 Arbitration Signals 


Provides all timing for XBus signals. XCLK is an input to 


ali devices on an XBus. 


Used by MXCC to indicate an unrecoverable error in 
MXCC or the SuperSPARC processor (SSP) to the 
BWs. Always driven by MXCC. Bussed to all BWs. 


The arbitration group contains two request signals and a grant signal per 


device. 
XREQ0[1:0] 
XREQ1[1:0] 
XREQ2[1:0] 
XREQ3[1:0] 
XGNTO 
XGNT1 
XGNT2 


XGNT3 


Bus request and control flow from BWO to MXCC. 
Bus request and control flow from BW1 to MXCC. 
Bus request and control flow from BW2 to MXCC. 


Bus request and control flow from BW3 to MXCC. 


Bus grant from MXCC to BWO. 
Bus grant from MXCC to ВМА. 
Bus grant from MXCC to BW2. 


Bus grant from MXCC to BW3. 
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19.3.3 Data Signals 


19-8 
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Notes: 


An XGNTn signal is asserted continuously for the number of cycles granted. 
Its duration is two cycles or nine cycles, depending on the length requested 
(see Section 19.4). 


All signals in the arbitration group are point-to-point and are always driven. 





The data group contains the 64 data signals and four parity signals. 


XData[63:0] 


XPar[3:0] 


These bussed bidirectional signals carry the informa- 
tion being transported on XBus as well as the headers 
that contain address and control information. A device 
drives XData only after receiving XGNT from the arbiter. 


These bussed bidirectional signals carry parity com- 
puted over for XData. The parity for a given value of 
XData appears in the same cycle as the value. The 
XData signals checked by each of the XPar signals are: 


= XPar[3] checks XData[63:48] 
= XPar[2] checks XData[47:32] 
= XPar[1] checks XData[31:16] 
= XPar[0] checks XData[15:0] 


For each value of XData there are two correct encodings of 
XPar[3:0]: all even and all odd. in аћ-емеп encoding, there are 
an even number of logic ones in each of the four sets of 17 
bits, comprising a 16-bit portion of XData and the XPar bit that 
checks it. All odd encoding has an odd number of ones in 
each of the same four sets of 17 bits. 


The all-even encoding is used for header cycles, while the all- 
odd encoding is used for data cycles. Any combination of 
XData and XPar that is neither all even nor all odd indicates 
a parity error. 


All-even parity during a data cycle is a special indication that 
the cycle is a memory fault cycle. See Memory Fault Cycle, 
page 19-20. 
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19.4 XBus Basic Operation 


Cycles 


Packets 


Transactions 


The XBus operation has three levels: 
С Cycles, 

С Packets, and 

(] Transactions. 


A bus cycle is one period of the clock; it forms the unit of time and one-way 
information transfer. 


A packet is a contiguous sequence of cycles that constitutes a unidirectional 
command and information transfer. XBus packets are either two or nine cycles 
(one cycle with command and status information and either one or eight cycles 
of data). 


A transaction generally consists of a request and reply pair of packets, but a 
few transaction types have only a request packet. 


Each chip on the XBus can have several XBus sources and targets, each 
identified by a unique XBus ID. An arbiter permits the bus to be multiplexed 
among the various chips. Before a chip can send a packet, it must be granted 
mastership by the arbiter. Once ít has control of the bus, it puts the packet on 
the bus one cycle at a time, without interruption. 


A reply packet must be sent to the target ID that was the source ID of the 
request packet that it answered. 


The basic layout of a two-cycle packet is shown in Figure 19-2; the basic 
layout of a nine-cycle packet is shown in Figure 19-3. Each begins with a 
header cycle. The packets complete with one or eight data cycles. The 
interpretation of the data cycles depends on the command in the header cycle. 


Figure 19-2. Basic Format of a Two-Cycle Packet 


Header Cycle first cycle 
all even parity 


Data Cycle second cycle 
all odd parity 


19-9 


Subject to Change Without Notice 


XBus Basic Operation 


Figure 19-3. Basic Format of a Nine-Cycle Packet 


Header Cycle 
Data Cycle 0 
Data Cycle 1 
Data Cycle 2 
Data Cycle 3 
Data Cycle 4 


Data Cycle 5 


Data Cycle 6 


Data Cycle 7 


19.4.1 Packet Headers 
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first cycle 
ali even parity 


second cycle 
all odd parity 


third cycle 
| all odd parity 


fourth cycle 
all odd parity 


fifth cycle 
all odd parity 


sixth cycle 
all odd parity 


seventh cycle 
all odd parity 


eighth cycle 
all odd parity 


ninth cycle 
all odd parity 


A header cycle is the first cycle of every XBus packet. The header contains 
data and tag commands, the memory or I/O address involved, and the XBus 
IDs of the source and destination of the packet. 


Header cycles are marked with ali even parity on the parity lines (XPAR[3:0]). 


Data cycles have odd 


parity computed on each of four groups of 16 bits 


([63:48], [47:32], [31:16], [15:0]). Devices must use parity checking to help 
locate packet headers in the stream of cycles on the bus. 


The basic format of the header cycle is shown in Figure 19-4. 
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Figure 19—4. Basic Format of the Header Cycle 
—— осмр — 
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The fields of the header cycle are introduced here and will be described in 
detail later in this chapter. 


PTYP Packet Type. Names the data commandto transfer data 
between BWs and the MXCC. 

R Reply packet. This bit is set on reply packets and clear 
on request packets. 

L Packet Length: This one-bit field indicates the packet 
length. (0 = two-cycle; 1 = nine-cycle) 

E Error. In areply packet, indicates that the corresponding 
request packet encountered an error. 

r reserved. 

XSrc XBus Source ID. 

XDst XBus Destination ID. Specifies the intended recipient of 
this packet. 

Size Number of bytes of data that will be transported by this 
transaction. 

TOmd Tag Command. Specify changes in cache or snoop tags 
for the specified address. 

Address Physical byte address as needed by the data command 


(DCmd) and the tag command (ТОта). 


The TGmd is used to keep the bus and processor side copies of cache tags 
consistent with one another. The TCmd and DCmd commands, along with the 
various control bits, provide sufficient flexibility to accommodate a variety of 
system busses. 


All devices are responsible for monitoring the bus for packets addressed to 
them. If they see a packet addressed to them, they perform the command 
contained in the header cycle. 
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19.4.2 Bus Watcher ID Decoding and Addressing 


The MXCC sends request packets destined to any BW to a single XDest of 
0x10. BWs can recognize packets by the XDest field in the header. BWs use 
Address[9:8] to partition requests (if present) among the several BWs. 
Table 19-1 shows how Address bits participate in decoding the BW that is 
addressed. Only one BW will reply to any request from the MXCC, although, 
in some systems, all may need to act upon the tag command. All need to act 
upon broadcast messages. 


Table 19—1.BW Addressing 









т [ш NN 
[> | Азем | 
[4 ainsi = Би | 


In addition to sending packets to the MXCC, a BW can send packets onto XBus 
addressed to itself or to other BWs. 


19.4.3 XBus Arbitration 


The MXCC contains а pipelined arbiter that controls access to XBus. To use 
the bus, BW number n uses the point-to-point ХАЕОЛ1:0] lines to request it. 
The bus request is always for a specific number of bus cycles (either two or 
nine cycles). The arbiter inside MXCC grants ownership with a point-to-point 
дгал пе, XGNTn. Thegrantline is generally asserted for the number of cycles 
allocated to the requesting device (either two or nine cycles). The device that 
receives the grant drives the bus for exactly two or nine cycles. See Section 
19.10 for details. 


19.4.4 Packet Priority 
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Packets from the BWs are categorized as either high-priority or low-priority. 
Packets given high priority are limited to system bus reply packets resulting 
from cache read misses on the VBus. All other reply packets and all request 
packets are given low priority, A BW that has bus grant must send all 
high-priority packets in its queues before sending any low-priority packets. 
This priority mechanism allows packets containing data that the SSP is waiting 
for (such as data to complete a read miss) to bypass packets that are not as 
critical to the processor (such as prefetch data or requests from the system 
bus). 


To accomplish this priority scheme, the MXCC employs two first-in first-outs 
(FIFOs) each for receive and transmit packets (see Figure 19-5). Most BW 
designs would also use dual queues. 
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Figure 19-5. Queues in Bus Watchers and MXCC 


+ | System Bus 


EE SER 


Bus Watcher 





Following are the queues and their explanations for Figure 19-5. 


BOL 


BOH 


BW low-priority outgoing queue that contains requests 
for system bus resources. 


BW high-priority outgoing queue that contains replies to 
system requests. 


BW low-priority input queue that contains system re- 
quests and most system replies. 


BW high-priority input queue that contains read miss re- 
plies. 


Low-priority MXCC output queue (requests). 
High-priority MXCC output queue (replies). 


Low- and high-priority MXCC input queues (actually 
combined into one queue). 


Holds low-priority XBus arbitration requests. 


Holds high-priority XBus requests 
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Since the operation of both the BWs and the МХСС depends on queues, it is 
important that the queues never overflow. The MXCC protects its input queue 
(XIH/XIL) from overflowing with the XGNTn lines. The MXCC does not issue 
XGNT 1o any BW when its input queue is too full. 


in order to protectits queues (BOH/BOL) from overflowing, a bus watcher uses 
the XREGn lines to inhibit packets on the XBus from the MXCC (either 
requests or replies). A BW may also request the XBus and simultaneously 
inhibit packets coming to it from the MXCC. The following is a list of the 
commands that can be encoded in the ХАЕОп lines: 


С] Inhibit the request packets from the MXCC to the BWs and system bus. 


Од inhibit request and reply packets from the MXCC to the BWs and system 
bus. 


О Requestthe XBus to send а low-priority packet from the BW to the MXCC. 
С Requestthe XBus to send ahigh-priority packet from the BW to the MXCC. 


Request from the XBus either a low- or high-priority packet from the BW 
and also inhibit requests and replies from the MXCC to the BWs. 


C 


XBus 


Subject to Change Without Notice 


XBus Protocol 





19.5 XBus Protocol 


19.5.1 Cycles 


19.5.2 Packets 


XBus follows a protoco! that is understood by all devices. The XBus protocol 
is defined in terms of cycles, packets, and transactions. 


A bus cycle is one period of the bus clock; it forms the unit of time and one-way 
information transfer. 


All cycles on the XBus fall into one of four categories: 
(Д Header 


A header cycle is always the first cycle of a packet. The header defines the 
packet size and conveys the data command, tag command, and address 
for the packet. Header cycles have all-even parity. 


Q Data 


Data cycles normally constitute the remaining cycles of a packet and con- 
vey the data that accompanies the command. Data cycles have ali-odd 
parity. 

С Memfault 


MEMFAULT cycles are used to indicate an errorin one of the data cycles of 
a packet. Memiault cycles have all-even parity. 


О Idle 


Idle cycles are those during which no packet is being transmitted on the 
bus. idie cycles have all-odd рату. 


A given cycle with all even encoding is a HEADER cycle if it is the first cycle 
of a packet; otherwise, it is а MEMFAULT cycle. A given cycle with all odd 
encoding is a DATA cycle if it is known to lie inside some packet; otherwise it 
is an IDLE cycle. 


When the parity encoding is neither all-even nor all-odd, the cycle has a 
transmission parity error. 


A packet is a contiguous sequence of cycles. The first cycle (header) of a 
packet carries address and control information, while subsequent cycles carry 
data. Packets come in two sizes: two cycles and nine cycles. The size is 
indicated by the L bit in the header cycle. 


An XBus device sends a packet after arbitrating for the XBus and getting grant. 
Packet transmission by a device is uninterruptable once the header cycle has 
been sent. 
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A six-bit DCmd field bit in each packet encodes the packet type. One of these 
bits (the R bit) encodes whether the packet is a request or a reply; the other 
five and packet types (PTYPs) and encode the transmission type. Detailed 
information about various DCmds is provided in Subsection 19.5.5. 


19.5.3 Transactions 


A transaction consists of a pair of packets (a request packet and a reply 
packet) that together perform a logical function. The PTYP for a reply is the 
same as forthe requestto which itis responding. Replies aresentto the source 
of the request, so the XDST field of a reply is the same as the XSRC from the 
request. 


Packets usually come in pairs, but there are two exceptions to this. For the 
Flush Block transaction, several reply packets may be generated for one 
request. Since Flush Block Requests are unilaterally generated within the 
MXCC, several Flush Block Reply packets may be sent without any request 
on XBus. For a transaction that times out, no reply packet may be generated. 


19.5.4 Packet Detection 


Header cycles are indicated by even parity encoding on each of the four parity 
bits XPAR[3:0]. The XBus device uses this information as well as its current 
XBus state information to recognize a header cycle. Each XBus device is 
expected to track activity on the bus and know whether the current cycle is idle, 
a header cycle or a data cycle. The purpose of the four parity bits is to confirm 
the expected bus state rather than to detect it. 


Once the header cycle has been recognized, the XBus device expects data. 
The number of data cycles is determined by the length bit in the message 
header. Data and Юіе cycles have the same рату encodings, thereby 
preventing the parity from distinguishing between them. 


19.5.5 Header Cycle Format 


Header cycles are indicated by the all-even encoding on the parity wires. The 
format of the header bits is shown in Figure 19-6. 


Figure 19-6. XBus Header Cycle Encoding 


— DCMD —— 
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Bit field explanations: 
DCMD Data Command. This field contains the six-bit data com- 
mand composed of the five-bit PTYP and 1-bit А fields. 
Refer to Subsection 19.7.1. 
19-16 XBus 


Subject to Change Without Notice 


XDST 


XBus Protocol 





Packet Type, a subfield of DCmd. Indicates the type of 
transaction to which this packet belongs. 


Reply, a subfield of DCmd. This bit indicates that the 
packet is a reply rather than a request. The reply to are- 
quest has the same DCMD with R = 1. 


= 0: Request packet. 
w 1: Reply packet. 


Packet Length. This one-bit field indicates the packet 
length. 


= 0: Two-cycle packet 
m 1: Nine-cycle packet 


Error. The error bit is defined only for reply packets. Itin- 
dicates that the corresponding request packet encoun- 
tered an error. The second cycie of an error reply pro- 
vides additional error information, and the remaining 
cycles are ignored. Note that the length of an error reply 
packet is the same as the length of the corresponding 
normal reply packet. See Subsection 19.5.6 for more in- 
formation on the ERR field. 


One-bit reserved field. Value must be zero. 


XBus Source ID. Used by the MXCC to more fully qualify 
the DCmd fields of packets it sends to BWs. The MXCC 
reflects the XSRC field of packets from BWs into the 
XDST field of replies. 


XBus Destination ID. This field specifies the intended 
recipient of this packet. All packets sent by the MXCC to 
BWs have XDST = 0x10. Packets from BWs to the 
MXCC must have XDST as shown in Table 19-9 and 
Table 19-10. 


When the XDST specifies a BW (0x10), additional in- 
formation such as address bits may be needed to select 
between the several BWs on this XBus. 
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SIZE 


Table 19-2. Size Field Specifications 


TCMD 


ADDR 


19.5.6 Errors on XBus 


The size field specifies the number of bytes of data that 
will be transported by the current transaction. See 
Table 19-2. 





The MXCC never generates packets with reserved size 
codes in Table 19-2. The MXCC has undefined behav- 
ior if it receives a packet with a size code shown as re- 
served. 


Tag Command. The tag command specifies how the tag 
entry for the specified address should be manipulated. 
Refer to 19.7.2. 


Address. The 36-bit address field specifies the byte ad- 
dress. The ordering of bytes is big-endian. References 
are aligned on the same boundaries as the size. For ex- 
ample, byte references are allowed on byte boundaries, 
but 16-bit references are allowed only on 16-bit bound- 
aries (address 0x0000, 0x0002, 0x0004, etc). 32-bit ref- 
erences are allowed only on 32-bit boundaries (ad- 
dresses 0x0000, 0x0004, 0x0008, 0x000C, etc). 


Errors on XBus can be reported in one of three ways. One way is in error reply 
packets that are reply packets with the E bit set. The second method of 
reporting errors is through the XPAR[3:0] signals. The third method is on the 
ССЕНН pin. See Section 16.8. 
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Reply packets may have the E bit set in the header to indicate that the 
corresponding request could not be processed correctly. The second cycle of 
an error reply provides additional error information, and the remaining cycles, 
if it is a nine cycle reply, are ignored. Note that the length of an error reply 
packet is the same as the length of the corresponding normal reply packet. 


The format of the Error Data Cycle is shown in Figure 19-7. The Error Data 
Cycle has the normal parity for data cycles, all odd. 


Figure 19-7. Format of Error Data Cycle 


63 32 0 


The least significant three bits of the error data cycle convey an error code, 
ECODE. Table 19-3 shows the encoding of the ECODE field. 


Table 19-3. Encoding of the ECODE Field of the Error Data Cycle 










шше | 


The ECODE field encodes errors. The meanings of the codes are largely 
uninterpreted and are logged in the МХСС'5 error register (see MXCC Error 


Register, page 0-42). The meanings of the codes are similar to those of the 
MBus errors of the same name. 


О BE (Bus Error) 


This error code indicates a bus error such as a parity error on the system 
bus. It can also be used to indicate another system implementation depen- 
dent error. 


О TO (timeout) 


This code may be generated by a BW after some length of time has 
elapsed without a reply from the system bus. This error code can also be 
used to indicate a system implementation-dependent error. 


19-19 


Subject to Change Without Notice 


XBus Protocol 





Note that XBus does not have transaction timeouts as such. But BWs may 
generate timeout errors for transactions that have taken longer than ex- 
pected to complete. Since no device on XBus will take responsibility for 
them, packets addressed to an invalid XDST will be lost and never gener- 
ate a timeout error. 


С UC (uncorrectable) 


This code is signalled when the addressed system device (usually a 
memory controller) encounters an uncorrectable error (such as parity, un- 
correctable ECC, etc) in accessing the data. This error code can also be 
used to indicate a system implementation-dependent error. 


С} UD (undefined) 
This code indicates a system implementation-dependent error. 


Memory Fault Cycle 


Parity may be used to communicate a data error. The sending device can drive 
all-even parity on a data cycle. This indicates that a memory fault condition 
exists for the corresponding data. A data cycle with all even parity is called a 
memory fault cycle. The format of a memory fault cycle is shown in 
Figure 19-8. 


Figure 19-8. Format of Memory Fault Cycle 
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The least significant three bits of the error data cycle conveys an error code, 
ECODE. Figure 19-3 shows the encoding of the ECODE field. 


Transmission Parity Errors 


A parity error can happen at any time. Parity that is neither all even nor all odd 
indicates an error in the transmission on XBus. The MXCC reports an XBus 
transmission parity error by asserting the CCEHR pin. 


E-cache Parity Errors 


When the MXCC detects a parity error in the extemal cache, the MXCC 
asserts the ССЕНН pin. MXCC and the SSP both generate and check even 
parity on VBus when VBus parity checking is enabled in CCCR.PE and 
MONTL.PE, respectively. 
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19.6 Cache-Consistency Protocols 


The MXCC supports full cache consistency on the system bus and between 
the SSP and the MMC. MBus and XBus configurations differ in consistency 
support and protocol. In the MBus configuration, the MXCC supports an MBus 
cache-consistency protocol. In the XBus configuration, the cache-consistency 
protocol is supported cooperatively by the MXCC and the bus watchers. 
Both protocols implement data ownership. A sub-block of data may be owned 
by aparticular cache in the system. This owner is responsible for supplying the 
data and for writing the data back to main memory. Memory owns all data not 
owned by any cache. 


19.6.1 Sub-Block States 


The E-cache tags store cache-consistency states for each sub-block. Three 
bits are used to encode the state, but only five states are used. The encoding 
of the consistency states is shown in Table 19-4. 


Table 19—4. Encoding of Cache-Consistency States 


| STATES, |з [оу 
wain {хх [о 
| Exclusive & Clean (EX&C) | о | о | т | 
| Exclusive &Diry(EX&D) | o | 1 | 1 | 
| Shared & Clean (SH&C) | 1 | O | 1 | 
| Shared &Dity(SH&D) — | 1 | 1 | 1 | 
In addition, there is another bit per sub-block in the E-cache tags—the P 
bit—that is used only to prevent the SSP from sending multiple accesses to 


the same sub-block. This bit does not participate in the cache-consistency 
protocols. 













19.6.2 State Transitions 


In the XBus configuration, the MXCC implements only a very few state 
transitions directly. Instead, the BWs direct the state transitions on all 
accesses from the system bus and on any accesses by the SSP that require 
access to the system bus. Only transitions for processor read hit and 
processor exclusive write ($0,021) are implemented in the MXCC. 


The BWs control the cache-consistency state by passing a state transition 
command using the ТОМО field of the XBus command header. The ТОМО 
field indicates how the S, O, and V bits should be updated for the transaction. 
This scheme allows the BW to implement the state transition diagram of the 
system bus to which it is connected without using a fixed algorithm. A BW will 
command transactions that are not shown in Figure 19-9 in order to 
implement the system bus's cache-consistency scheme. 
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The owner of a sub-block is the E-cache of the processor that last wrote to the 
sub-block. When the SSP issues a write to a sub-block not exclusively owned 
(851 or O=0), the MXCC issues a write command to the BW. The BW 
communicates the shared write to other caches in the system on the system 
bus. The BW can then send a write reply to the MXCC with the TCMD field, 
indicating that the MXCC is to assume ownership. 


The state transitions implemented directly in the MXCC are shown in 
Figure 19-9. 


Figure 19-9. MXCC XBus Cache-Consistency State Diagram 
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19.7 XBus Commands 


Bus commands are carried in the header cycle of each packet. XBus defines 
two types of commands: 


Q Data commands (DCmd). 


С] Tag commands (ТС). 


19.7.1 Data Commands 


The data command is used by the sender to get data from or put data into 
another device. Thedata command is composed of the PTYP and RPLY fields 
of the header. When RPLY is 0, the command is a request. When it is 1, the 
command is a reply. Table 19-5 shows the data commands by PTYP and the 
packet length of the request and reply packets of each PTYP. 


Both types of command are present in the header cycle of a packet. The two 
commands operate independently. 


Table 19-5. Data Commands 





A Get Block 
Flush Block 
Get Single 
Put Single 
Get Block 
Put Block 20r9 


VO Get Single 

VO Put Single 

VO Get block 

VO Put block 
Demap 01110 

мез im e |: 
Swap single 10101 

облаже но | 2 | e 
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NoOp Request 


NoOp Reply 


The NoOp command indicates that no data is being transferred. It is useful 
because it allows the tags of an entry to be manipulated via TCMD without 
having to transfer data. 


The NoOp Request packet is two cycles long. The first cycle is the header; the 
second cycle is unused. 


A NoOp Reply packet is used to reply to a NoOp Request packet. It indicates 
acknowledgement of the NoOp Request packet. PTYP, SIZE and ADDRESS 
are identical to those in the request header, and the XDST field is identical to 
the request packet's XSRC field. 


Non-Cacheable Get Block Request 


NC Get Block is used by an XBus device to read a block from physical address 


space (memory space as opposed to I/O space) when it does not intend to 
cache the block. 


An NC Get Block Request packet requests that the block specified by the 
ADDRESS field in the header be returned via an NC Get Block Reply. The 
packet is two cycles long. The first cycle contains the header; the second cycle. 
is unused. 


Non-Cacheable Get Block Reply 


An NC Get Block Reply packet is a response to an earlier NC Get Block 
Request. The packet is nine cycles long. The header cycle reflects most of the 
information in the request header, while the eight cycles of data supply the 
requested block in critical word first order. Critical word first order means that 
the byte addressed in the request is retumed in the first data cycle, and 
additional double-words of the block at increasing addressed modulo 64 (the 
block size in bytes) are sent in successive data cycles. 


PTYP, SIZE, and ADDRESS are identical to those in the request header; the 
XDST field is identical to the request packet's XSRC field. 


Flush Block Request 
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This command is neither sent nor handled by the MXCC. 
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Flush Block Reply 


The MXCC generates Flush Block Reply packets automatically after it 
generates a Get Block Request packet. These Flush Block Reply packets 
have no corresponding request packet. They are used by the MXCC to return 
dirty sub-blocks from the victimized block to memory. The victimized block is 
the block that will be contain the newly read data from the Get Block Reply 
packet. 


The packet is nine cycles long. The eight cycles of data transmit the data in 
natural order. Natural order means that bytes 0-7 of the cache block are in the 
first data cycle, followed by data for increasing addresses in successive 
cycles. ADDRESS[5:0] are zero. SIZE is 64 bytes, and ADDRESS[35:8] is the 
physical address of the block E-cache. The XDST field is 0х10. 
ADDRESS[7:6] gives the number of the sub-block within the block being 
flushed. 


Get Single Request 


This command is used to fetch a single (up to doubleword) datum from the 
36-bit physical address space. It is not supported or generated by the MXCC. 


Get Single Reply 


A Get Single Reply packet is two cycles long. The header cycle reflects most 
of the information in the request header, while the data cycle supplies the 
requested single. PTYP, SIZE and ADDRESS are identical to those in the 
request header, and the XDST field is identical to the request packet's XSRC 
field. 


Put Single Request 


This command is used to store a single (up to 64 bits) datum into the 36-bit 
physical address space. 

The Put Single Request packet requests that a given value be written into the 
single specified by the ADDRESS and SIZE fields of the header. Note that the 
value to be written maybe be a byte, a half-word, а word, or а doubleword as 
specified in the SIZE field. A Put Single packet is two cycles long. The first cycle 
is the header; the second cycle contains the value to be written. 


Put Single Reply 


A Put Single Reply packet is used to acknowledge an earlier Put Single 
Request. The header cycle reflects most of the information from the request 
packet header; the second cycle contains the data that was written, just as in 
the request packet. PTYP, SIZE, and ADDRESS are identical to those in the 
request header, and the XDST field is identical to the request packet's XSRC 
field. 
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Get Block Request 


This command is used to read a block from physical address space with the 
intent of caching it. 


A Get Block Request packet requests that the block specified by the 
ADDRESS field in the header cycle be retumed via a Get Block Reply packet. 
A Get Block Request packet $ two cycles long. ADDRESS[2:0] are ignored by 
the MXCC and always zero in Get Block Request packets issued by the 
MXCC. The first cycle is the header and the second cycle contains the address 
of the block being victimized, if any. The format of the victim address is shown 
in Figure 19-10. 


Figure 19—10. Format of Victim Address Data Cycle 


Get Block Heply 
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The fields of the victim address data cycle are: 


VV Victim Valid: 
a 0: Invalid. No victim. 
w 1: Valid victim address. 


ZERO Always zero. 


VICTIM ADDRESS Physical address of the cache block to be replaced. 


If VV is clear, there are no valid sub-blocks in the selected block to be replaced. 
If VV is set, there are one or more valid sub-blocks in the selected block to be 
replaced. The valid sub-blocks may be clean or dirty. If a victim sub-block is 
dirty, it will be written with a Flush Block Reply packet after the Get Block 
Request has been sent. 


A Get Block Reply packet is a response to an earlier Get Block Request. The 
packet is nine cycles long. The header cycle reflects most of the information 
in the request packet header, while the eight data cycles transmit the 
requested block of data in critical word first order (see Non-Cacheable Get 
Block Reply, page 19-24). 


PTYP, SIZE and ADDRESS are identical to those in the request header, and 
the XDST field is identical to the request packet's XSRC field. 
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Put Block Request 


Put Block Reply 


This command is used to write a block into physical address space. 


A Put Block Request packet requests that a given value be written into the 
block specified by the ADDRESS field of the header. A Put Block packetis nine 
cycles long. The first cycle is the header; the remaining eight cycles transmit 
the data to be written in natural order (see Flush Block Reply, page 19-25). 
ADDRESS[5:3] must be zero. ADDRESS[2:0] are ignored. 


A Put Block Reply packetis a response to an earlier Put Block Request packet. 
There are two sizes of Put Block Reply packets, two cycles and nine cycles. 
Atwo-cycle Put Block Reply simply acknowledges that the Put Block Request 
has been performed. The first cycle is the header; the second cycle is unused. 


А nine-cycle Put Block Reply is used for MXCC stream operations. It not only 
acknowledges the Put Block Request, but also supplies the data to be written. 
The eight data cycles are in natural order. 


In both sizes of Put Block Reply packets, the PTYP, SIZE, and ADDRESS 
fields are identical to those in the request packet header, and the XDST field 
is identical to the request packet's XSRC field. ADDRESS[5:0] are ignored and 
must be zero. 


The stream hardware in the MXCC can cause a write of data cached in the 
E-cache RAM. The BW is responsible for snooping all of its own operations, 
including stream writes. When a BW detects a tag match on a stream write, 
the BW retums a nine-cycle stream write reply containing the 64 bytes of data. 
The МХСС writes the data into the E-cache RAM and invalidates the internal 
caches in the SSP by issuing invalidate cycles on the VBus. 


IO Get Single Request 


This command is used to fetch a single (up to doubleword) datum from the IO 
address space. 


An IO Get Single Request packet requests that the single datum specified by 
ADDRESS and SIZE fields in the header cycle be returned ма an ІО Get Single 
Reply packet. An IO Get Single Request packet is two cycles long. The first 
cycle is the header; the second cycle is unused. 
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IO Get Single Reply 


An IO Get Single Reply packet is a response to an earlier IO Get Single 
Request packet. This packet is two cycles long. The header cycle reflects most 
of the information in the request header, while the data cycle supplies the 
requested single. PTYP, SIZE and ADDRESS are identical to those in the 
request header, and the XDST field is identical to the request packets XSRC 
field. The data cycle contains the data. For SIZE less than 64 bits, the 
requested data is in its natural position based on ADDRESS and SIZE. For 
example, а single byte requested from an address with ADDRESS[2:0] = 6 
would be on DATA[15:8]. Other fields of DATA should be ignored by the 
recipient of the reply. 


IO Put Single Request 


IO Put Single Reply 


This command is used to write a single (up to 64 bits) datum into the IO address 
space. 


The iO Put Single Request packet requests that a given value be written into 
the single specified by the ADDRESS and SIZE fields of the header. The value 
to be written may be a byte, a half-word, a word or a doubleword as specified 
in the SIZE field. An IO Put Single packet is two cycles long. The first cycle is 
the header, while the second cycle is contains the value to be written. The data 
sent in the data cycle must be aligned into the field of DATA[63:0] that 
corresponds to the selected ADDRESS and SIZE. For example, a single byte 
to be stored іп an address with ADDRESS[2:0} = 0 must be supplied on 
DATA[63:56]. Other fields of DATA are ignored by the recipient. 


An IO Put Single Reply packet acknowledges that an earlier IO Put Single 
Request is complete. The header cycle reflects most of the information from 
the request packet header; the second cycle is unused. PTYP, SIZE, and 
ADDRESS are identical to those in the request header, and the XDST field is 
identical to the request packet's XSRC field. 


IO Get Block Request 


19-28 


This command is used to read a block of data from the IO address space. 


An IO Get Block Request packet requests that the block specified by the 
ADDRESS field in the header be retumed via an IO Get Block Reply packet. 
An IO Get Block Request packet is two cycles. The first cycle contains the 
header The second cycle is unused. ADDRESS[5:3] must be zero. 
ADDRESS[2:0] are ignored. 


XBus 


Subject to Change Without Notice 





IO Get Block Reply 
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An О Get Block Reply packet is a response to an earlier IO Get Block Request 
packet. The packet is nine cycles long. The header cycle reflects most of the 
information in the request header, while the eight data cycles transmit the 
request block of data in natural order (see Flush Block Reply, page 19-25). 


PTYP, SIZE, and ADDRESS are identical to those in the request header. The 
XDST field is identical to the request packet's XSRC field. 


IO Put Block Request 


IO Put Block Reply 


Demap Request 


This command is used to write a block of data into the IO address space. 


An IO Put Block packet requests that a given value be written into the block 
specified by the ADDRESS fieki of the header. The packet is nine cycles long. 
The first cycle contains the header; the subsequent cycles contain the data. 
The data is present in natural order (see Flush Block Reply, page 19-25). 
ADDRESS[5:3] must be zero. ADDRESS[2:0] are ignored. 


An IO Put Block Reply packet acknowledges that the write requested by an 
earlier IO Put Block Request is complete. The packet is two cycles long. The 
first cycle contains the header.; the second cycle is unused. PTYP, SIZE, and 
ADDRESS are identical to those in the request packet. The XDST field is 
identical to the request packet's XSRC field. 


Demapis used to remove the virtual to physical mapping for one or more virtual 
pages. When a processor demaps a page, its MXCC issues a Demap Request 
on XBus with XDST = 0x10 (BWs). A BW communicates the request over a 
system bus. Other BWs on the system bus react to the request by sending a 
Demap Request packet on their local XBuses to their MXCCs. Once an MXCC 
has acted on the Demap Request from the bus, it replies with a Demap Reply 
packet to its local BWs. When all processor/MXCC pairs have replied, the 
originating BW replies to its MXCC with a Demap Reply packet, indicating that 
the Demap has been completed by all processors. 


ADemap Request packetis two cycles. This first cycle is the header cycle. The 
ADDRESS field of the header must be zero. The second cycle of the packet 


is the demap data cycle. The format of the demap data cycle is shown in 
Figure 19-11. 
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Figure 19—11.Demap Data Cycle Format 


ZERO 
63 


Demap Reply 


Смт PN rv | взуо | 


48 47 32 31 121 87 0 
ZERO Unused 16-bit field of zeros. 

CNTXT Context. 

VPN Virtual Page Number. 

DTYP Demap Type. The encoding of this field is identical to the 


SuperSPARC processor demap types in Table 9-6. 


RSVD Reserved. Must be zero. 


A Demap Reply packet acknowledges an earlier Demap Request packet. The 
reply is two cycles. The first cycle contains the header. The ADDRESS fíeld 
in a Demap Reply must be zero. The second cycle is unused. 


interrupt Request and interrupt Reply 
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This command is used to generate processor interrupts. Each interrupt has an 
initiating XBus, and one or more target XBuses. The initiating XBus contains 
the device that generated the interrupt, while the target XBuses are the ones 
whose processors are to be interrupted. The interrupt command is used both 
at the initiating end and the target ends. 


When a processor initiates an interrupt, its MXCC sends an Interrupt Request 
packet to its BWs, one of which communicates the request on a system bus. 
The BW also sends an Interrupt Reply packet to the initiating MXCC when the 
Interrupt Request has been successfully sent on a system bus. This reply to 
the originating MXCC serves as an acknowledgement. If the originating MXCC 
is also one of the targets of the Interrupt Request, the reply packet also serves 
to request that MXCC interrupt the local processor. 


When а processor is the target of an interrupt initiated from another XBus, the 
interrupt request arrives over the system bus, and the target BW sends the 
request to the target MXCC as an Interrupt Request packet. Once the target 
МХСС has interrupted its processor, it acknowledges the Interrupt Request 
packet with an Interrupt Reply packet. 


The interrupt Request packetis atwo-cycle packet. The first cycle contains the 
header. The ADDRESS field of the header must be zero. The second cycle is 
an interrupt Data Cycle. The format of an Interrupt Data Cycle is shown in 
Figure 19-12. 
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Figure 19-12. Interrupt Data Cycle Format 
т ЕВЕ SETBITS 


32 31 15 14 0 


ZERO Unused 32-bit field of zeros. 


SYS Uninterpreted information passed through from MXCC 
interrupt generation register to the BW for system-spe- 
cific use. 


SETBITS Bits to Set. Specifies the bits that are to be set in the tar- 
get MXCC’s interrupt pending register. See Section 
16.7. 


Swap Single Request 


Swap Single Reply 


This command is used to perform an atomic load-store on a single datum (up 
to 64 bits) in the 36-bit physical address space. 

A Swap Single Request packet requests that a given value be atomically 
written into the single specified by the ADDRESS and SIZE fields of the 
header. The value written may be a byte, half-word, word, or doubleword, as 
specified by the SIZE field. A Swap Single Request packet is two cycles long. 
The first cycle is the header; the second cycle contains the valueto be written. 
As in Put Single Request and Get Single Reply, if the SIZE is less than 64 bits, 
the data is on the field of DATA determined by SIZE and ADDRESS[2:0]. Other 
fieids of DATA are ignored and may contain unrelated information. 


A Swap Single Reply packet is used to acknowledge an earlier Swap Single 
Request. The header cycle reflects most of the information in the request 
header, while the second cycle contains the data to be written just as in the 
request packet. This is the same as for a Put Single Reply because the swap 
operation is actually performed in cache, not in the system memory. 

PTYP, SIZE, and ADDRESS are identical to those in the request packet 
header. The XDST field is identical to the request packet's XSRC field. 


IO Swap Single Fiequest 


This command performs an atomic load-store a single datum (up to 64 bits) 
into the iO space. 


An IO Swap Single Request packet requests that a given value be written into 
the location specified by ADDRESS and SIZE fields of the header and the old 
value be retumed. The value may be a byte, half-word, word ог doubleword 
as specified by the SIZE field. The packet is two cycles long. The first cycle is 
the header; the second cycie is the data value to be written. 
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IO Swap Single Reply 


An ІО Swap Single Reply packet acknowledges that an eariier IO Swap Single 
Request is complete. The first cycle contains the header; the second cycle 
contains the data returned by the destination device for the atomic load-store. 
The retumed value is presumably the location’s previous contents. 

PTYP, SIZE, and ADDRESS are identical to those in the request packet 
header. The XDST field is identical to the request packet's XSRC field. 


19.7.2 Tag Commands 


Tag commands are used to keep cache tags in the MXCC for E-cache and the 
tags inthe SSP consistent with system bus tags in the BW(s). The six-bit ТСта 
field within a packet header specifies how the tag for the sub-block referenced 
by the address part of the header shouki be manipulated. BWs and the MXCC 
use these TCmds to keep each other's tags synchronized. The portion of the 
tag of concem to tag commands contains some address comparison 
(ACOMP) bits and four flag fields: 


С Shared 


The shared bit indicates whether the entry is present in more than one 
cache on the system bus. 


С] Owner 


The owner bit indicates whether this copy should respond to reads over 
the system bus. 


О Valid 


The Valid field indicates whether this entry is valid. The tag entry is valid if 
this field is one and invalid if the field is zero. 


(Д Pending 
The pending bit is useful for packet switched busses indicating whether an 


entry has a packet outstanding on the system bus. A packet is outstanding 
if a request packet has been sent for which no reply has been received. 


The ACOMP field of the tags contains those bits of ADDRESS that are not 
used to index into the tag array. 


The ТСта field of a packet header consists of three sub-fields, as shown in 
Figure 19-13. TCmd encodes the tag command and applies to the sub-block 
addressed by the address field of the header. 


Figure 19-13. TCMD Field of Header Cycle 
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The sub-fields of ТОМО are SHCMD, OWCMD, and VCMD. Their actions are 
largely but not entirely rotational. When SHCMD and OWCMD are both 11, 
neither of these fields has any effect. The interpretation of the sub-fields is as 


follows: 


SHCMD: 


Table 19-6. Shared Bit Commands 


OWCMD: 


Table 19-7. Owner Bit Commands 


Shared Bit Command. Indicates how the shared bit for 
the addressed sub-block should be manipulated. See 
Table 19-6. 












|зномо [ACTION | 
[© | Wtea value of | 
[or | Witea value of | 
[© | Expect value of | 


Noo den OWC ог 
when OWCMD 
ah 11 


The two codes far expected values are provided to the 
destination device for information and error checking. 
The MXCC reports a cache-consistency error (see Sec- 
tion 15.8) if the expected value from the SHCMD field of 
а packet from the system does not match the E-cache 
tag S bit of the addressed sub-block. A BW implementa- 
tion is not required to check that the local copy of the tag 
S bit has the expected value, but may do so for better 
error detection. 










Owner Bit Command. Indicates how the owner bit 
should be manipulated. See Table 19-7. 












[owcwo [ACTION | 
© [ево | 
[01 | Wte a value of T | 
[— io — | Emectavalueoto | 


Expect a value of 1 or 
No-op when SHCMD 
is му 11 
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VCND: 


Table 19-8. Valid Bit Commands 
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The two codes for expected values are provided to the 
destination device for information and error-checking. 
The MXCC reports a cache-consistency error (see Sec- 
tion 15.8) if the expected value from the OWCMD field 
of a packet from the system does not match the E-cache 
tag's O bit for the addressed sub-block. A BW imple- 
mentation is not required to check that the local copy of 
the tag O bit has the expected value, but may do so for 
better error detection. 


Valid Bit Command. Indicates how the valid bit should 
be manipulated. See Table 19-8. 







| VCMD | ACTION __ | 
| 00 | Ext Write invalidate | 
|. 0t | Carvaid | 
NR NEM 
LM | 







Set Valid and Write Tag 





Ext Write Invalidate and Clear Valid are the same to the 
МХСС. Ext Write invalidate is used by a BW to signal the 
MXCC that the valid bits for a cache sub-block should 
be cleared as a result of an external write from another 
processor. Clear Valid is used to remove a block from 
the cache. The action of these VCMDs is conditional on 
the P (pending) bit of MXCC’s sub-block tag. If the P bit 
is clear (no operation pending on this sub-block), the 
valid bit for the sub-block is cleared. If the P bit is set, 
another operation is pending on this sub-block, and the 
VCMD is ignored. 


Set Valid and Write Tag is used by a BW to signal MXCC 
that the information is new and the processor-side tags 
should be updated to maintain consistency between the 
tags. 
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19.8 MXCC Use of XBus 


The МХСС issues only packets with certain combinations of header fields. 
Furthermore, it expects only certain packets from XBus. Other combinations 
are not used. BW designers will find the following information helpful in 
designing BWs to work with the MXCC. 


19.8.1 XBus to MXCC 


Table 19-9 contains the header field bits that the MXCC expects in request 
packets from XBus. The patterns in the TCmd field may differ from those 
shown, and any valid TCmd may be used. Different TCmd fields may be 
needed to support different system buses. 


The MXCC Op field contains the name of the internal operation in the MXCC 
that generates this packet for packets from an МХСС or the internal operation 
generated by this packet for packets to an MXCC. In the MXCC Op field, NC 
means "Non-Cacheable," and CD means “Cache Disabled". 


Table 19-9. Summary of XBus Request Packets to MXCC 


Г ео — [Pme [я E E яис | XOST | sz | TOMD [xsus Request 
[BUSREADBLOGK | оно [о [о fo | 100m | 00000 | v [ттт | Заве — 
Раше Ем? [emo [о fo [о | toon | ойо | ою [мии | Бела —— 
[BUSINTERRUPT — [en [o [o [o | Too | ооо | ooo [rrr | поли —| 
SUSNORERD | oroo [o [о [о | зоо | 00000 | oe wm [15Gesmae | 
suswcwmmE [очи [o [o [о | oom | eoooo | ous [тїт | о Рибиде | 
[BUS STAT UPDATE [00000 [о [o [o | 199m [ooo | o [wee | NoOp _ 
ада [лот [о [o [o | 100m [000 | oes | порт | Swan Seas | 
ате | оног [о [о [о | Too | 00900 | cab [ron | Ћабије — 
suswmre mer [o [o [o wom | о0000 | сар [orsi Ризе _| 
автал [eos [o [1 [о 160 | 0000 | 101 [oorr | Purmer — 
[BUS STREAMREAD | боото [o [o |o | 169m | 00000 | 19 NG Ge Bock 
PeUSREADLOGK wore [6 [о [о | Tom [ooo] 101 [omm [Gee | 


In Table 19-9, values are indicated in binary. The following symbols are used 
to indicate certain variable data: 



















x Packet-dependent. Can be either 0 or 1. 


E This bitis set on a reply when the corresponding request 
cannot be processed correctly. When the error bit is set, 
the least significant three bits of the first data cycle pro- 
vide additional information about the error. 
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ab Data Size. The data size is encoded into these two bits 
as: 


в 00: Byte 

= 01: Halfword 

= 10: Word 

= 11: Doubleword 


nn BW number may be here. If there is only one BW, it is 
BW number 01. If there are only two BWs, they are BW 
numbers 00 and 01. 


Summary of XBus Reply Packets 


Table 19-10 contains the header field bits that the MXCC expects in reply 
packets from XBus. Reply packets always match a previous request packet 
sent by the MXCC. 


The patterns in the TCmd field may differ from those shown. Any valid TCmd 
may be used. Different TCmd fields may be needed to support different system 
buses. The ERR field ís normally O but may be 1 if an error was encountered 
when the request was processed. 
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Table 19-10. Summary of XBus Reply Packets to MXCC 
















ENS ZIEL 
[oo wamE нар [eo [т [o ЈЕ | то | окно | cab [boo | Рабле — 
[eDROBLKRERLY | ото [т [т ЈЕ | ooo | опсо | or | mano | биво — 
Пао ААО REPLY | oomo [тт [Е | omo | опот | 101 | amio | вивык | 
[ROSTRMWAREPLY | oroni [т [o [Е | 1900 | 00000 | vor [mm | оРиво — 
ГАО Stam Ro REPLY | окно [т [т [Е | тоо | voooo | 101 [mm | оби вок 

Emo: [тө ЈЕ | зоо | 00001 [ oas | wr | О онар ие 
For [1 [о [E тою | 00000 | оао | i | ПОРабије | 
олово [т [о [E [тоо | 00000 | сар | wr | оба Single 
тото |“ |7 ЈЕ | зоо | овом | чот | сото | завох — 
room [п Е | 1900 [umo | чот | mm | Алек — 
Foro [тте | зоо [uxo | 101 | mm | оба вох | 
[s [Е | 100 [ооо | 000 nn [Deme _ | 
Го [Е [ло | оооог | oe [по | Swap Sale 
Ы 
т 
B 
u 
a 











о ЈЕ | 1000 ооо | o |nono|TaSmde — 
| ЈЕ [-1000 | оохю [тот [по | carmo — 
rr [E | 16000 [охот [тот | oo | сиво — 
ҮЕ [1o | omo [101 | осто | биво — 

v [o fE [тоо [сот | 000 nmm [memet — | 


[omo | 
[1000 
| 00101 | 
[omo 
00110 | 
ото | 
[us | 
Lor |:|o|E| 10000 | 00000 | 000 | 111111 | тети __ | 


19.8.2 MXCC 10 XBus 


Table 19-11 contains the header field bits that the MXCC sends in request 
packets to XBus. 
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Table 19-11. XBus Requests from MXCC to XBus 



















Med a | ie ani ux nd am 
[BURST READ ASS [wn [v | о | ооо | wooo 1: [mm | вивых — 
READ MISS fomo [о | © | со | то | чи | чит биво — 
PWAITEWISS — [eoo [о [о sooo [1099 | ти wm | сиво — | 
[SHARED WE | они [о [о | 00000 | 10000 | cab [rx | Рап — 
[SHARED WRITER | ог [v9 | 00000 10000 sb | зими" | Porse 
SHARED LOST 1o: [о [о [оойт 10000 [о x | народе 
Deae [оо _| о | о | о0000 | тох | ooo | тип | бетар | 
[STREAM READ [обо [о | o | ооо | 10000 | vor [mm _| мо би бо 
SrmemwwmrE — [oom _| o | т [об | око | ver [mm _| Pur Biok 


[PREFETGH ____| оно [о [о | сом | voooo wr [mm бево — 
отео [оо [о | о | оосо | 10000 | cab [mm | Оби иде 

[NcwemE ____ том [о [о | ооо [оо [o [m | ПоРабиџе | 
[NC STREAM READ oro [Го | o | 00000 | 10000 | 101 [rmm | Обивох — 
[NG STREAM WATE [oro o T noo | тою | we: [mw [ЛОБ — 
mems wer [о | © | оооог | 10000 | ов | тит | 1O Swap Sra 
орке [um [о [о отот | лож | ти [wm | биво | 
[оо READBLOGK | oomo [vo Гоп [тоо [vor [yr савок — 
соматЕ [onor роо [ото | тою [оь [nnn | Pasme 
Fcousr ____| пио оо | топ o | ов | тен | Swap Single | 
Cwremmuer — [wmm [о [6 | оооот [тою ox urn mene | 
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In Table 19-11, values are indicated in binary. The following symbols are used 
to indicate certain variable data: 


x Packet-dependent. May be either 0 or 1. 


ab Data Size. The data size is encoded into these two bits 
as: 


= 00: Byte 

= 01: Halfword 

и 10: Word 

= 11: Doubleword 


Replies from MXCC to XBus 


Table 19-12 contains the header field bits that the XBus expects in reply 
packets from MXCC. The pattems in the TCmd field may differ from those 
shown. Any valid ТСта may be used. Different ТОМО fields may be needed 
to support different system buses. 


Table 19-12. Replies from MXCC to XBus 


Н |Р 
Y |N 


VALIDATE REPLY | 00000 = v|w [тою [ox mw [Wes — — — 
READ BLOCK REPLY | oomo _| + | 1 | ооо [ioo | 101 [лики | Germanen — 
LSTREAWRD REPLY | 00010 | 1 | o [ойо [100 [s | лики [o GerBockHen | 
























[STREAM WRREPLY | oom | 1| o 00000 | 10000 | [nm faSekRe — 
[REREAD REPLY [o [т [| оо [1009 | Gab [im | Оби Sire пер | 
ПАЛАТЕ REPLY [oreo 1 ooo [eoo |o [n | [OPA Singe Ну _ 
Russ f ooon [+] т [ооо [oo [тот vemm [HeersockReh _ 
земље не — [emo [1o [woo [me [mW [бетеру 
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19.9 Write Hits in Stream Transfers 


19-40 


The stream hardware in the MXCC can cause a write of data cached in the 
E-cache RAM. The bus watcher is responsible for snooping all of its own 
operations, including stream writes. When a BW detects a tag match on a 
stream write, the BW returns a nine-cycle stream write reply containing the 64 
bytes of data. MXCC writes the data into the E-cache RAM and invalidates the 
internal caches in the SSP by issuing invalidates on the VBus. 
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19.10 Arbitration and Flow Control 


The MXCC contains a built-in, pipelined arbiter to control access to XBus. The 
arbiter performs the basic function of controlling access to the bus so that only 
a single device drives XBus on each cycle. It contains additional functions to 
reduce the average time needed for arbitration. 


The use of queues is inherent in packet-based communication. Queues have 
finite size, however, so too many packets destined for a single destination can 
exceed the capacity of a receiving device's queue. Flow control is the stopping 
of senders that may overflow a receiver's queue until it has sufficient capacity 
for the incoming packet. . 


Each request to the arbiter for access to XBus encodes the length and priority 
of the packet to be sent. Refer to Subsection 19.4.4 for information about the 
priority of XBus packets. 


Arbitration and flow control are intimately related for XBus, and are both linked 
to packet priority. Each device has low- and high-priority queues. A packet of 
a certain length and priority can be sent if the receiving device has space for 
at least that length in the incoming queue for that priority. Otherwise, the 
sender is blocked until there is sufficient room in the receiving queue. 


19.10.1 XREQ Decoding 


BWs request use of XBus on two dedicated lines to the MXCC, which contains 
the XBus arbiter. The meaning of the ХАЕОи[1:0] signals depends on the 
sequence of values on the two lines. The sequences used and their meanings 
are described in Table 19-13. 
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Table 19—13. Arbitration and Flow Control Encoding 
FIRST CYCLE | SECOND CYCLE 
XREOn[1:0] XREUn[1:0] 


E si 
ОО [Bexwxcorespadefrnnegdes — 










The sequential nature of the ХНЕОп(1:0] signals describes a finite state 
machine (FSM). Figure 19-14 represents the state machine that interprets the 
ХКЕС values to determine whether to prevent the MXCC from sending low- 
and high-priority messages. 


If a value of HH (i.e., neither ХНЕОп[1] nor XREOn[0] asserted) has been 
present for two or more cycles, the state machine is at IDLE. If the machine 
receives a value of HL (ХАЕОп 1] = false; XREOn([0] = true), the state machine 
moves to state HOLD LOW, the state where the low-priority queue is signalled 
to block for the next nine cycles. If, at the next clock, the value is again HL, the 
state machine moves to HOLD LOW HIGH, the state where both low- and 
high-priority queues are signalled to hold. If the value is HH, the state machine 
returns to IDLE. For any other value, the state machine goes to the command 
decode state COMMAND. 
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Figure 19-14. Message Blocking State Machine 
XREGUn[1] = H 





The XREGn encodings that block the REQUEST and REPLY packets from the 
MXCC are not cumulative. They signify that, from this time forward (for nine 
cycies), the MXCC should not send any REQUEST or REPLY packets. This 
is illustrated in Figure 19-15, which depicts message packets from the MXCC 
to a particular BW. In cycle 1 the BW inhibits REQUEST packets for nine cycles 
(i.e., the first REQUEST packet directed at this BW would be in cycle 12). In 
cycle 2 the BW inhibits all packets for nine cycles. (The earliest the MXCC 
could send a packet would be cycle 13.) In суде 3 the BW again inhibits 
packets for nine cycles (now the earliest packet would be in cycle 14). Since 
ХВЕСп signals are clocked before being used, the actual blocking is delayed 
for one cycle. The blocking period for the first block is from the start of cycle 
3 to the end of cycle 11. 
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Figure 19-15. Sequential Blocking Of Messages 


6 ,7 , 8,9 | 10, 11,12 ' 13, 14 ; 15 | 


D 1 
2 3: 4,5 
D 

1 ' D à ' ' ' ' 


mu МА Ри ји ЄЛ за ка 





XDATA[63:00] : 


A header may arrive before the block becomes effective. Figure 19-16 shows 
arequestfor a low-priority packet and blocking of all MXCC packets. Note that 
a packet begins before the block occurs, and that the effective blocking period 
is for only one cycle. 


Figure 19-16. Header Received Before Block 


1, E; 3, 4; E, 8 ,T , 8,9 ‚ 10, 11,12 | 13, 14, 15, 


CLK 


XREOn[1:0] 


XDATA[63:00] 





19.10.2 Arbitration Priorities 
The XBus arbiter in the MXCC supports four priorities. Listed in descending 


priority order, they are: 
CC HIGH XBus arbitration requests from the МХСС 10 send reply 
packets to a BW (highest priority). 
1944 XBus 
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BW HIGH 


BW LOW 


CC LOW 


XBus arbitration requests from BW to send block read 
reply packets to the MXCC. 


XBus arbitration requests from BW to send system re- 
quest packets and most system bus reply packets to the 
MXCC. 


XBus arbitration requests from the MXCC to send ro- 
quest packets to a BW (lowest priority). 


Servicing of requests at a specific level is round-robin. For example, all 
requests at BW LOW priority are granted before any XBus requests at CC 


LOW are granted. 


When а high-priority reply packet is present in a BW (destined for the MXCC), 
the MXCO issues a grant for that high-priority packet, in effect changing the 
priorities for nine cycles to: 


(BW HIGH » CC HIGH » BW LOW » CC LOW) 


19.10.3 Flow Control in MXCC's Queues 


There are six queues in the MXCC, as shown in Figure 19-5. The flow control 
mechanism for each is described below. 


XIL/XIH 


ARBL 


ARBH 


XOUXOH 


XIL holds low-priority packets from BWs, and XIH holds 
high-priority packets from BWs. They are flow-con- 
trolled by the arbiter. The arbiter refuses to grant re- 
quests to any of the BWs when either queue is above its 
high water mark. 


Holds arbitration requests from BWs to use XBus to 
send low-priority packets and is not flow-controlled. 
This FIFO contains as many entries as there are entries 
in the BW XIL FIFO. It cannot overflow because XIL 
does not overflow. 


Holds arbitration requests from BWs to use XBus to 
send REPLY packets. ARBH contains a single entry. It 
cannot overflow because the MXCC will permit only a 
single outstanding request packet that will generate 
high-priority reply packets from the BWs. 


XIL holds low-priority packets to the BWs, and XIH holds 
high-priority packets to the BWs. These are prevented 
from overflowing because MXCC prevents ће proces- 
sor from accessing VBus when these queues reach 
their high water marks. 
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19.10.4 Flow Control in Bus Watcher's Queues 
A prototypical BW has four queues. The flow control mechanism for each is 


described below. 

BOL/BOH BOL (BW outgoing low-priority requests) and BOH (BW 
outgoing high-priority replies) are flow-controlled by the 
BW via its XREQOn[1:0] lines, which can block either re- 
quest or reply packets. 

BIL/BIH BIL (BW low-priority request replies) and BIH (BW high- 


priority read miss replies) mustbe flow-controlled on the 
system side of the BW. The BW must in some way pre- 
vent the system bus from issuing cycles that would re- 
sult in reply or request packets being queued for the 
XBus when either of these queues is full. 


19.10.5 Default Grantee 


The minimum arbitration latency is five cycles if the requestor is a bus watcher 
and two cycles if the requestor is the MXCC. Therefore, assuming the XBus 
is idle, arbitration latency adds seven cycles to the cost of a cache miss. 


The default grantee mechanism is a way to avoid this latency on most replies. 
When the MXCC issues a read miss to a system bus through a bus watcher, 
it gives grant to that BW in anticipation of the reply packet from the system bus. 
The BW experiences no arbitration delay when the reply arrives and can 
immediately transmit the reply packet on the XBus. 


When a BW has default grant, it must be prepared for XGNTn to be deasserted 
by the MXCC at any time. If it has already begun to send a transaction, it must 
continue. The МХСС guarantees that no other device will obtain ownership of 
the XBus during the time the BW is sending packets. 


MXCC selects which BW is the default grantee based on ADDRESS[9:8] and 
the number of BWs, according to Table 19-14. 


Table 19—14. Default Grantee 


Number ADDRESS[9:8] ADDRESS[9:8] ADDRESS[9:8] ADDRESS[9:8] 
of BWs =00 =01 =10 =11 
мм | м | Bw | 


га | № | м“ | ~ | ви 





19.10.6 Bus Ownership 
A device on XBus can only drive the bus when it owns the bus. Ownership is 
determined by arbitration. 
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The granted device owns the bus exactly two cycles after the grant has been 
issued (refer to Figure 19-17). The length of ownership is dependent on the 
length of the grant signal. The owner {with the exception of default grantee) 
must drive the XDATA and parity lines exactly two cycles after the grant has 
been issued. It must drive these lines only for the length of time XGNTn is 


asserted. 


Figure 19-17. Bus Ownership 





OWNER 


19.10.6.2 Default Grantee 


ат уст" 





A default grantee gains ownership of the bus exactly two cycles after the grant 
has been issued. BWs are given default grant only for nine-cycle packets. 
Under some circumstances the grant will be negated before the default 
grantee has used the bus. The change of ownership follows very exact rules: 


Rule 1: 


Rule 2: 


If the BW detects a grant and it did not issue a request for the bus 
(using the ХНЕОп[1:0] lines) two cycles previously, the grant 
must be a default grant. 


The BW may drive a high-priority reply message fromthe system 
bus any time a DEFAULT GRANT has been issued, and, once 
the message is started, it must finish driving the header and eight 
data words. Refer to Figure 19-18. The default grant (XGNTn) 
is issued in cycle 1 and latched into the BW at cycle 2. It controls 
BW intemal logic during cycle 3. Cycle 3 is the earliest cycle in 
which a header may appear in reaction to the default grant. The 
MXCC removes the default grant at cycle a, and the last point at 
which the BW may drive a header is cycle c. The MXCC may 
issue a grant to another device during суде e if no header was 
driven during cyde c. 
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Rule 3: It is possible for a BW eligible to become a default bus grantee 
to issue a request for the XBus. Figure 19-19 shows such a 
situation. If the BW issues a request for the bus in cycle a and 
cycle b, the MXCC first reacts to the request in cycle c by 
removing the default bus grant. The last cycle in which the BW 
may issue a header for the default grant is cycle d. The grant 
from the MXCC for the low-priority request is issued during 
cycles d and e. The BW would drive the header during cycle f. 


Figure 19-19. Request During Default Grant 





19.10.7 Message Priority Detection 


19-48 


In order to correctly arbitrate for the XBus, BWs must be able to prioritize 
messages. Only system responses to GET BLOCK messages from the MXCC 
with XSRC ID of 0x02 or 0x01 should be considered high-priority messages. 


XBus 
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19.11 XBus Cycle Waveforms 


Figure 19-20 shows a simple two-cycle packet. The BW requests the use of 
the bus by asserting 012, followed by 11, on the XHEQn[1:0]lines. The MXCC 
grants the BW the bus for two cycles, and the BW sends a two-cycle packet. 


Figure 19-20. XBus Two-Cycle Packet 





Figure 19-21 shows a nine-cycle packet transmitted by a bus watcher. The 
XREOn-lines are driven 012, followed by another 012. This is a request for a 
low-priority nine-cycle packet. The MXCC grants the bus to the BW for nine 
cycles. 


Figure 19-21. XBus Nine-Cycle Packet 








Mauer 


Figure 19-22 shows multiple BWs requesting the bus. BWO requests the bus 
for alow-priority two-cycle packet, and ВМИ requests the bus for a high-priority 
nine-cycle packet. The MXCC grants BWO the bus for two cycles, then 
immediately grants BW1 the bus for nine cycles. 
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М Figure 19-22. Multiple Requests 





pom 1 Ba | | 


Figure 19-23 shows a TMS390Z55 two-cycle request packet, followed by a 
reply packet from the BW. Note that the XHEGn arbitration request is for a 


low-priority two-cycle packet. 


Figure 19-23. Two-Cycle Request and Two-Cycle Reply 





‚ Header Data | ' Header Data ) | 
па УСКО ен a 
MXCC READ request MXCC READ request 


Figure 19-24 shows an MXCC two-cycle request packet, followed by a 
nine-cycle reply packet from the BW. The reply packet is high-priority. 
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Figure 19-24. Two-Cycle Request and Nine-Cycle Reply 





Figure 19-25 shows an MXCC nine-cycle packet (block write) and a 
corresponding two-cycle reply packet. 


Figure 19-25. Nine-Cycle Request and Two-Cycle Reply 
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The BootBus is a simple synchronous 12-pin interface provided by the Multi- 
Cache Controller (MXCC) for accessing an EPROM for bootstrap loading and 
for accessing other low-speed peripherals. BootBus supports an address 
space of 16M-byte. Provisions are made for reading or writing from one to eight 
bytes from/to BootBus devices and for polling the devices for interrupts. Boot- 
Bus is only available in the XBus configuration (when MBSEL is low). BootBus 
is accessible from both the VBus and the XBus. 
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20.1 Introduction 
A block diagram of the BootBus portion of a system is shown in Figure 20-1. 


Figure 20-1. XBus System With BootBus 





SuperSPARC 
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20.2 BootBus Signals 
LDATA[7:0] 


LCMD[2:0] 


LCMDS 


ee 


BootBus address, data, and interrupts according to 
the command on LCMD. 


BootBus command. The commands are issued by 
the MXCC and interpreted by one or more of the ex- 
ternal BootBus controllers. See Table 20-2 for the 
BootBus command encoding. 


BootBus command strobe. When asserted, this sig- 
nal indicates that command information on 
LCMD[2:0] and write data on LDATA[7:0], for 
WRITE-VALID commands, is valid. Input data is 
latched on the rising edge of LCMDS. 


The connection of MXCC, the customer-designed BootBus controller, and ex- 
ample peripheral devices is shown in Example 20-1. 


Example 20—1. BootBus Connections 
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The LCMD[2:0] signals connect from MXCC to the BootBus controller (BBC). 
The BBC decodes this command and addresses from LDATA[7:0] to control 
access to the devices on BootBus. The BBC latches the three portions of the 
address from ADR-HIGH, ADR-MED, and ADR-LOW commands and sup- 
plies the demultiplexed address to BootBus devices, such as memories, that 
require it. It may also decode a portion of the latched address to control which 
device is accessed for read and write operations. The BBC can use this in- 
formation along with decoding LCMD[2:0] to control the output and write en- 
ables of the various BootBus devices. 


Some BootBus devices may generate interrupts. The interrupt request signal 
from these devices are combined and encoded by the BBC. The result is used 
to reply to interrupt commands from the MXCC. 


BootBus 
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20.3 BootBus Addresses 
Table 20-1 illustrates BootBus address decoding. 


Table 20—1. BootBus Address Decoding 














Non-Cacheable Space 
ADDR[35:28]- OxFF 
ADDR[?27:24) = 0x0 or 0x1 
ADDR[23:00] = BootBus Address 
PA[35:28] = OxXX 
PA[27:24] = 0x0 
РА[23:00] = BootBus Address 


РА = Physical Address Х = don't caro (МХСС will ignore) 


BootBus is accessible only from the MXCC in XBus configurations. On the 
XBus, non-cacheable reads and writes with PA[24] set to zero access the 
BootBus. See Chapter 19, XBus. 


The first instruction fetch by the SuperSPARC Processor (SSP) after reset is 
always at physical address OxFFO0000000. In XBus configurations, this ad- 
dress always accesses BootBus. XBus systems must provide a read-only 
memory (ROM) or other source of instructions on BootBus for the system reset 
handler. 
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20.4 BootBus Transactions 


BootBus Commands 


The encoding of LCMD[2:0] is shown in Table 20-2. АН command cycles on 
BootBus are validated by LCMDS. 


Table 20—2. BootBus Command Encoding 


оное [Name | мет — 
[ &» [mk __|ње | 
ют | WRITEVALI | na rom WHOS 
[ 81 | READNALD | cata trom devico 
[от Гама [ве _ 
[© [RoR LOW [айй — 
ют  [AoRMED [adiens | 
[Ho WIERRUPT [tert вама | 
[m [OR HIGH [айел 












С Idle Command 


The idle command is used to three-state LDATA[7:0] whenever the driving . 
source is changed. Both the MXCC and the BBC disable all drivers on 
LDATA during idle commands. 


С] Write Valid Command 


The write valid command instructs the address decoder to write the se- 
lected device with the data on LDATA[7:0]. 


С Read Valid Command 
The read valid command instructs the address decoder to drive the se- 
lected device data onto LDATA[7:0]. 

C] ке Write Command 
The idle write command allows the BBC enough time to set up the targeted 
address. LDATA is three-stated during the idle write. 

О Address Commands 


Address low-byte, address middle-byte, and address high-byte com- 
mands are used to transmit portions of the BootBus address from the 
MXCC to the BBC. The MXCC uses three consecutive cycles on BootBus 
to send an address. The first cycle sends the high byte (bits 23:16) of the 
address. The second cycle sends the middle byte (bits 15:8) of the ad- 
dress. The third cycle sends the low byte (bits 7:0) of the address. 
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An ежета! decoder must decode the address to select the appropriate 
device. This decoder is shown as the BBC in Figure 20-1. 


(] Interrupt Command 


When the Interrupt command is asserted, the pending interrupt informa- 
tion encoded by the BBC is asserted on LDATA[3:0]. LDATA[7:4] are ig- 
nored by the MXCC during Interrupt commands. The encoding of interrupt 
information is as shown in Table 20-3. 


Table 20-3. Interrupt Encoding 










nterrupt level 1 
interrupt level 2 


СЕ 


[im — T iter evel 15 


The MXCC sends Interrupt commands whenever BootBus either com- 
pletes a Read or Write cycle or is not ín use. 
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20.5 Multi-Byte Transfers 


The MXCC can read or write multiple bytes on BootBus with an abbreviated 
addressing sequence between the constituent bytes. Figure 20-2 shows the 
steps of a multi-byte read on BootBus. 


Figure 20-2. Steps in a Multi-Byte Read 
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20.6 BootBus Example Transactions 


Example 20-2, Example 20-3, Example 20-4, Example 20-5, and 


Example 20-6 are examples of BootBus transactions. 


Example 20-2. BootBus Write 


CLK 
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Example 20-3. BootBus Read 





LCMD[?:0] 


Example 20-4. BootBus Interrupt 


CLK 
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Example 20-5. Two-Byte BootBus Read 





LDATAO- 
LDATA7 ~ ^ADDR.H 









t V 3» & J 
тт. 
О О 
LDATAO- ОЦЕ WR № DATA 
LDATA7 , ^ ^ N /\ 
а кие пав а-я 
' у | LI О L] ‘ 14 4 
LCMDO- —- 111 — 00 
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SuperSPARC provides the five-signal IEEE 1149.1 JTAG serial scan interface 

to allow observation and control for board-level, Built-In Self-Test (BIST), and 

chip-level testing; and to support a remote debugging environment. 

Topic Page 
21-1 
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21.1 Introduction 


21-2 


The /EEE 1149.1 Standard Access Port And Boundary Scan Architecture 
Specification was developed to solve the problems of testing high-pin-count 
devices and to resolve system testing issues. 


The SuperSPARC processor (SSP) implements the protocol defined in the 
IEEE 1149.1 Standard Access Port and Boundary Scan Architecture Specifi- 
cation. Users should be familiar with this specification before reading this 
chapter. 


SuperSPARC provides the five-signal JTAG serial scan interface mechanism 
to support the following: 


Г] Manufacturing Fault Coverage 


JTAG allows test software access to internal scan logic to determine de- 
vice manufacturing correctness. 


Сі Periphery and Interconnect Testing 


JTAG allows software-controlled boundary scan to test the periphery and 
interconnect between chips on boards that use SuperSPARC. Boundary 
scan testing requires software that uses JTAG to scan in, apply, scan out, 
and compare vectors. 


С} Built-In Self-Test 


in addition to software initiation using AS! 0x39, SSP BIST can be initiated 
by software that uses the JTAG interface. 


С] Remote Debugging Environment 


A scan-based debugger (SDB) can use JTAG to halt an application pro- 
gram, examine or alter register and memory state (including the program 
counters), set breakpoints or counters (to specify conditions where control 
should leave the application code and retum to the SDB), and to resume 
control within the application code. Such software can download SPARC 
assembly language for the intended scan-based debug function, inspect 
device-specific JTAG status, and provide a recovery mechanism when 
scan-based debug instructions fault. See Chapter 22 for more details. 
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21.2 JTAG Interface 


The IEEE 1149.1 JTAG serial-scan interface mechanism is composed of a set 
of pins and a test access port (TAP) controller state machine that responds to 
those pins. The design is partitioned into several serially scanned "rings" that 
are independently accessed. Ring selection is determined by the Instruction 
Register (ЇН). Access to the IR scan chain is in accordance with the JTAG pro- 
tocol. 


The SuperSPARC JTAG interface is composed of five pins: 


С Test Clock (TCK)—The ТСК signal is used to clock the TAP and the test 
data registers defined in this chapter. 


С] Test Mode Select (TMS)—The TMS signal is used by the TAP controller 
to move to other states. The TAP controller state diagram can be found in 
Section 5.1 of the /EEE JTAG 1149.1 Specification. 


О Test Logic Reset (TRST)—TRST is used to reset the SuperSPARC inter- 
nal JTAG TAP Controller. 


L] TestData In (TDI)—Serially transmitted test instructions are sent via TDI. 


Q Test Data Out (TDO)— Data is serially transmitted from SuperSPARC to 
the extemal JTAG controller via TDO. 
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21.3 TAP Controller 


The TAP controller is an internal sequencer that manages access to all JTAG 
test data registers. The TAP controller examines ТАЗТ and TMS sequences 
each TCK cycle for JTAG state transitions. The state of the TAP controls asser- 
tion of CAPTURE, SHIFT, and UPDATE operations. See the /EEE 1149.1 
Specification for more details. 


The TAP controller state diagram can be found in Section 5.1 of the /EEE 
1149.1 Specification. 


21.3.1 JTAG Reset Requirements 


21-4 


The TAP controller enters the TEST-LOGIC-RESET state when: 
[] TMS is asserted for five consecutive TCK cycles, or 
(а TRST is asserted for a single cycle. 


Both TMS and THST must be negated to exit the TEST-LOGIC-RESET state 
and move into the RUN-TEST/IDLE state. 


The JTAG TAP controller must be reset at power-up to guarantee correct Su- 
perSPARC operation. if TCK is not present, TRST must be asserted at pow- 
er-upto properly reset the TAP controller. Otherwise, the JTAG IR will be in an 
indeterminate state that could result in undefined operations. 


SuperSPARC requires that the external JTAG busmaster TAP controller re- 
main in synchronization with SuperSPARO's internal JTAG TAP controller. 


У 


Note: 


All systems (with TCK active) that use the TRST assertion to reset Super- 
SPARC’s intemal JTAG TAP controller must: 


С] Keep TMS asserted during TRST assertion. 


(3 Hold TMS asserted for a minimum of three ТСК cycles (as seen by Su- 
perSPARC) after negating TRST. 
peÓ— —  r€—— án — ниацин иаа. 
Note: 
If JTAG is not used in a system, TRST should be asserted to avoid unin- 
tended JTAG operation. 


и = же } 
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21.4 JTAG Operatlons 
The three major scan chain operations defined in IEEE 1149.1 are: 
С CAPTURE, 
С SHIFT, and 
С UPDATE. 


Each of the serially scanned rings intemal to SuperSPARC is composed of a 
reconfigurable shift register chain. Each stage of the scan chain has two regis- 
ter chains of equal length: 


О The primary scan chain register. 
О The update scan chain register. 
The primary scan chain register is configurable as a shift register, while the 
update register is not. (See the /EEE 1149.1 JTAG Specification for more de- 


tails). The functionality of a single-bit JTAG scan chain register is as shown in 
Figure 21-1. 


Figure 21-1. JTAG Register 





Figure 21-2 illustrates the three operations in a simplified functional view. 
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Figure 21-2. JTAG Operations—CAPTURE, SHIFT, and UPDATE 


CAPTURE UPDATE 


Primary Register Update Register промо Родно 
Data 
CN 
| | 
TCK TCK 


21.4.1 JTAG CAPTURE Operation 


During a CAPTURE operation, the JTAG scan chain element captures se- 
lected data into the primary register. After one TCK, those captured values are 
ready to be shifted out (using SHIFT operation) to the SuperSPARC level TDO 
atthe end ofthe scan chain ring. in some specific cases, Capture-DR (Capture 
test data register) captures the value of a particular register into the primary 
register. This capability allows capturing of intemal logic states, and the infor- 
mation is then shifted out to be read. 


21.4.2 JTAG SHIFT Operation 


During a SHIFT operation, the JTAG scan chain element operates as a shift 
register. Each primary register stores its TDI value and shifts forward its TDO 
in the next TCK cycle as input to the next primary register in the scan chain. 
For a ring of size N, the TAP controller must shift N times to completely fill the 
scan chain. During this operation, the update register is unused and retains 
its value. 





21.4.3 JTAG UPDATE Operation 


During an UPDATE operation, the JTAG scan chain element delivers the value 
contained in the primary register into the update register. Except for this over- 
write period, the update register retains its previous value. Internal logic sees 
only the value contained in the update register. A single UPDATE cycle will de- 
liver the intended values to the update register scan chain. The UPDATE is 
periormed after all the values have been shifted into the primary scan chain 
registers. During this operation, the primary register retains its value. UPDATE 
is ignored by non-writable registers. 
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21.5 The SuperSPARC Processor JTAG Instructions 
Five scan domains inside the SSP are accessible through JTAG: 
(3 The IR register domain, 
(С The standard JTAG register domains (BYPASS, CID, BSCAN), 
OQ An internal hardware test domain (internal use only), 


С The BIST register domains (SHORT. BIST, LONG BIST, SIGNATURE), 
and 


O The scan-based debug register domains (MDIN, MCI, MSTAT, MDOUT). 
Figure 21-3 is the block diagram of all JTAG-accessible serial-scan chains. 
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Figure 21-3. Block Diagram of JTAG Scan Chains Inside SuperSPARC 





The SuperSPARC five-bit IR selects Test Data Register (TDR) scan chains for 
the SHIFT, UPDATE, and CAPTURE operations. The IR scan ring is selected 
when the JTAG TAP controller is in the UPDATE-IR state. A Capture-IR does 
not capture the current value of the IR update register. Instead, it retums a fixed 
binary encoding of 00001. The low two bits of this encoding are required by 
the /EEE 1149.1 Standard Access Port And Boundary Scan Architecture 
Specification. 
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The scan-based debug register domain consists of four scan chains: MDIN, 
MCI, MSTAT, and MDOUT. MDIN and MCI provide information from the SDB 
to the SSP, while MSTAT and MDOUT bring information from SuperSPARC to 
the SDB. See Chapter 22 for more details. 


Internal scan rings are for manufacturing use only; other usage will provide ип- 
defined results. 


The BIST register domain permits a BIST sequencer to internally generate, 
scan-in, apply, scan-out, and obtain a signature for state vectors. 


The IR encoding to select access to a particular SSP TDRscan chain is shown 
in Table 21-1. 


Table 21—1. TDR Scan Chain Selection by IR Encoding 


НАЕ — | oor _ 
колт — — — | oa — 
(тета! Scan Capture Clock Mode | 003 | 
memwSmnbmani — | 995 — 
Da | прав | 
мон — — — С 
[cout — ИССИ ИСИ 
wr — — — | | 5 
[Swwug — [ою | = 
[P _______| ow | № | 
со — ooe 32 | 
| БУРА —^ | ot | т | 


A brief description of each instruction is given below. For more details, see the 
IEEE 1149.1 Standard Access Port And Boundary Scan Architecture Specifi- 
cation. 






п/а 
n/a 
na 
п/а 
37 
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21.5.1 BYPASS 


The BYPASS scan chain comprises a one-bit primary scan register, with no 
update register. When IR selects BYPASS, the chip's TDI and TDO are essen- 
tially connected to this primary register. The Capture-DR state loads a one-bit 
zero into the primary register and Shift-DR requires one TCK cycle to forward 
data. The BYPASS register can be used to reduce the total scan chain length 
when other devices on the same TDI/TDO chain are being accessed. Update- 
DR has no effect on the BYPASS operation. 


21.5.2 Component ID 


The Component ID (CID) scan chain comprises a 32-bit ring of primary regis- 
ters, where БИО] is always 1, bit[11-1] the manufacturer ID, bit[27-12] the part 
number, and bit[31-28] the version number. (See Figure 21-4.) Capture-DR 
loads the SuperSPARC component ID into the CID primary register's scan 
chain. Subsequent Shift-DR cycles shift (bit[0]) out and scan new TDI data in 
(bit[31]). There is no update register, so Update-DR has no effect. 


Figure 21-4. JTAG ID Register Format 


| п [t 
31 28 27 12 11 10 
Ver Version number. Incremented on component revisions. 


pnum Part Number. A component ID assigned by the manufacturer. 
The SSP has a value of 0x04 in this field. 


manid Manufacturer ID. The identification number of the component's 
manufacturer. This field is set to 0x17 for Texas Instruments. 


21.5.3 Boundary Scan (BSCAN) 


SuperSPARC BSCAN consists of three operations: BSCAN-EXTEST, 
BSCAN-SAMPLE, and BSCAN-INTEST. The SSP BSCAN is a 290-bit ring of 
JTAG scan chain register elements. Capture-DR reads data from the chip pins 
into the primary register. Shift-DR forwards data through the scan chain, where 
bit{289] is output to ТОО. (See note below.) For the entire chain to be com- 
pletely written in or read out, 290 TCK cycles are needed. Update-DR copies 
data from the primary register into the update register, requiring only one TCK 
Cycle. Table 21-2 portrays the SSP's boundary scan map. 





Note: 


SSP revisions 1.Х, 2.X, and 3.X have a BSCAN chain length of 290 bits. Fu- 
ture revisions may have different BSCAN chain lengths. 
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The SAMPLE/PRELOAD instruction is used to preload data into the BSCAN 
chain for use by other instructions and to sample the data flowing through the 
SSP pins. The SAMPLE portion of the instruction occurs on the rising edge of 
ТОК during the Capture-DR state, while the PRELOAD occurs during the Up- 
date DR state on the rising edge of TCK. The PRELOAD data is scanned in 
while the SAMPLE data is scanned out. 


When the EXTEST instruction is selected, data that has been loaded into the 
BSCAN chain and, corresponding to output cells, will be forced onto the SSP 
output pins on the falling edge of TCK in the Update-IR state. This data will 
change only on the falling edge of TCK in the Update-DR state. This instruction 
allows testing of board-level interconnections. When this instruction is se- 
lected, all signals received at the SSP input pins will be loaded into the BSCAN 
chain in the Capture-DR state. 


Selection of the INTEST instruction allows testing of the SSP's internal logic. 
Data that has been loaded into the BSCAN chain and corresponding to input 
pins will be forced into the SSP's logic during the Update-IR state. This data 
will change only on the falling edge of TCK in the Update-DR state. Outputs 
ofthe logic will be captured into the BSCAN register in the Capture-DR state. 
This data can then be scanned out for analysis. 


21.5.4 SHORT BIST, LONG BIST 


21.5.5 Signature 


The SuperSPARC BIST mechanism is initiated by or examined through either 
JTAGor ASI memory references. When BIST is initiated, any pre-BIST Super- 
SPARC state will be destroyed. At the completion of a JTAG-initiated BIST, the 
user needs to generate the reset. This can be done by entering the TAP reset 
state by either assertion of TMS for five consecutive TCK cycles or asserting 
TRST. internal timing sequencing will guarantee PLL restabilization. Once ini- 
tiated, BIST will be under the control of SuperSPARC, and the JTAG TAP con- 
troller need not remain in WAIT/RUN, BIST state. See Section 13.4 for more 
details. UPDATE has no effect. At the completion of BIST, a CAPTURE of the 
SIGNATURE TDR should be done. 


co 6—6 


Note: 
LONG, BIST operation is not tested during manufacturing test. 


The signature scan chain is a 31-bit ring. Both long and short BIST operations 
cause the BIST sequencer to generate, scan in, apply, and scan out a signa- 
ture for one of two pre-defined pseudo-random test vectors. The SuperSPARC 
response to these vectors is collected and compressed into the signature reg- 
ister. The correct value of the signature register will be different for these two 
cases. See Section 13.4 for more details. 
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Table 21–2. SuperSPARC Boundary Scan Bit Definition 
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Note: 


This table reflects SSP revisions 1.X, 2.X, and 3.X. Future SSP revisions 
may have different BSCAN chain lengths and configurations. 


, ————————_—————_—__——_—_——_—_—_—_—_—_——і 


21.5.6 Scan-Based Debug 
There are four independent scan rings associated with scan-based debug: 
(а МОМ, 
а MCI, 
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(Д MSTAT, and 
(Д MDOUT. 


The MDIN scan chain is a 32-bit ring. Shift-DR shifts data from a primary regis- 
ter into the next primary register. Bit[0] goes out to TDO, while bit[31] reads in 
TDI. Update-DR copies data fromthe primary register into the update register. 
Capture-DR has no effect. 


The MCI scan chain is a 37-bit ring. Shift-DR shifts the primary register to the 
next primary register in the chain, where 5436] reads in TDI and bit[0] outputs 
TDO. Update-DR copies data from the primary register into the update regis- 
ter. 


The MSTAT scan chain is a 13-bit ring. Capture-DR captures data from the 
MSTAT register into the primary register. Shift-DR shifts the primary register 
to the next primary register in the chain, where bit[12] reads TDI and bit(0] out- 
puts TDO. Update-DR has no effect. 


The MDOUT scan chain is a 32-bit ring. Capture-DR captures data from the 
MDOUT register into the primary register. Shift-DR shifts the primary register 
to the next primary register in the chain, where bit[31] reads TDI and bit[0] out- 
puts TDO. Update-DR has no effect. See Chapter 22 for more details on scan- 
based debug. 


21.5.7 SEE PLL 
This scan chain is used to verify the integrity of the on-chip PLL with respect 
to clock jitter and VCO behavior. When selected, the scan chain will output PLL 


dock on TDO. This scan chain is used for hardware manufacturing tests and 
should not be used when the SSP is plugged into a board. 


21.5.8 INTERNAL SCAN 


This scan chain is used for hardware manufacturing tests; details are not pro- 
vided in this manual. 
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21.6 MultiCache Controller JTAG instructions 


Four scan domains inside the MultiCache Controller (MXCC) are accessible 
via JTAG: 


С] The ІВ domain, 

Г] The standard JTAG register domain (BYPASS, CID, BSCAN), 
[] An intemal hardware test domain (for internal use only), and 
(3 The status scan domain (DATASCAN, BCSCAN). 


The MXCC-IR four-bit serial register selects the test data register scan chains 
for the SHIFT, UPDATE, and CAPTURE operations. The IR scan ring is se- 
lected when the JTAG TAP controller is in the Update IR state. A capture IR 
returns the fixed binary value of 0001 as specified by the /EEE 1149.1 Stan- 
dard Access Port And Boundary Scan Architecture Specification. 


The IR encoding for each scan chain supported by MXCC is shown in 
Table 21-3. 


Table 21-3.JTAG Instruction register encoding 


21-14 


















Оне | птш — | Зиму — 
[оо [ют | Boundary Scan — 
п | ЗАМАШЕРАВАО | Boundary Sen 
а [тэн [тетт 
ле [олтон  SuusSan — | 
ов oo ____| меки ____| 
ът _| возом | вооот — 
оао [www — [а — | 





JTAG Serial Scan Interface 


Subject to Change Without Notice 


21.6.1 BYPASS 


MultiCache Controller JTAG Instructions 





The BYPASS instruction selects the Bypass Test Data Register, which is a 
single-bit scan register. The Bypass register can be used to reduce total scan 
chain length when other devices on the same TDI/TDO chain are being ac- 
cessed. Whenthere are many chips on a board and in the same chain and only 
afew chips need to be accessed, the other chips should have the Bypass reg- 
ister selected. The BYPASS instruction can be scanned into all the other de- 
vices, thereby considerably reducing the length of the overall scan chain. 
When the Bypass test data register is selected, a single bit of logic zero will be 
sent to TDO in the Shift-DR state. Thereafter, TDO will follow ТО! delayed by 
one TCK. 


21.6.2 COMPONENT ID (CID) 


The CID instruction selects the CID Test Data Register, which is a 32-bit scan 
register. The CID register has the format shown in Figure 21-5 and contains 
the МХСС'5 assigned JTAG component identifier. This register can only be 
captured—never updated. 


Figure 21-5. JTAG ID register format 


31 28 12 10 
ver Version number. Incremented on component revisions. 
pnum Part Number. A component ID assigned by the manufacturer. 


The МХСС has a value of 0x03 in this field. 


manid Manufacturer ID. The identification number of the component's 
manufacturer. This field is set to 0x17 for Texas Instruments. 


21.6.3 Boundary Scan (BSCAN) 


The BSCAN TDR of the MXCC is 485 bits long (see note below). This register 
supports three JTAG instructions: 


Q SAMPLE/PRELOAD, 

О EXTEST, and 

О INTEST. 

Table 21-5 is the map ofthe MXCC boundary scan chain in the MBus configu- 
ration. Table 21-6 is the map in the XBus configuration. Bit[484] is connected 
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21.6.4 DATASCAN 


21.6.5 BCSCAN 


to ТОО; bit[0] is connected to TDI. Update-DR copies data from the primary 
register into the update register. 


The SAMPLE/PRELOAD instruction is used to preload data into the BSCAN 
chain for use by other BSCAN instructions and to sample the data flowing 
through the MXCC pins. The SAMPLE portion of the instruction occurs on the 
rising edge of TCK during the Capture-DR state, while the PRELOAD occurs 
during the Update-DR state on the rising edge of TCK. The PRELOAD data 
is scanned in while the SAMPLE data is scanned out. 


When the EXTEST instruction is selected, data that has been loaded into the 
BSCAN chain and corresponding to output cells will be forced onto the MXCC 
output pins on the falling edge of TCK in the Update IR state. This data will 
change only on the falling edge of TCK in the Update DR state. This instruction 
allows testing of board-level interconnections. When this instruction is 
selected, all signals received at the MXCC input pins will be loaded into the 
boundary scan chain in the Capture-DR state. 


Selection of the INTEST instruction allows testing of the MXCC's internal logic. 
Data that has been loaded into the BSCAN chain and corresponding to input 
pins will be forced into the MXCC's logic during the Update IR state. This data 
will change only on the falling edge of TCK in the Update DR state. Outputs 
of the logic will be captured into the BSCAN register in the Capture-DR state. 
This data can then be scanned out for analysis. 


— 


Note: 


Rev. 2.X MXCC BSCAN chain is 485 bits. Rev. 1.X MXCC BSCAN chain is 
483 bits. Future revisions may have different BSCAN chain lengths. 





The DATASCAN instruction selects the Datascan TDR. This register consists 
of the MXCC Control Register, Reset Register, Error Register, Status Register, 
and BIST signature register. When this register is selected, all of these regis- 
ters may be captured and scanned out during normal operation of the MXCC. 
This register may be captured but not updated. The format of the Datascan 
register is shown in Table 21-4. 


The BCSCAN TDR consists of a one-bit primary register. A JTAG master can 
use this instruction to update the contents of the MXCC STATUS. BC bit while 
the system is running to communicate with the SSP. 


21.6.6 INTSCAN/INTSHIFT 
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" 


Table 21—4. DATASCAN Register Format 

Para 
S! (Software Interna) Reset) 
WD (Watchdog Reset) 
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119-120 | 0500 (reserved, reads zero) 

S (Supervisor Mode) 

ЕРА[0:7] (Error Code) 

ССОР[0:9] (Cache Controller Operation Code 
EV (Error information Valid) 

AE (Asynchronous Error) 

CP (Parity Error, MXCC Master) 

VP (Parity Error, SuperSPARC Master) 
CC (Cache Consistency Error) 

XP (XBus Parity Error) 

ME (Multiple Errors) 


122-129 
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Table 21-5.MXCC JTAG Boundary Scan Bit Order for MBus 
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Table 21—5. MXCC JTAG Boundary Scan Bit Order for MBus (Continued) 
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Table 21-5 reflects MXCC Rev. 2.X BSCAN chain. MXCC Rev. 1.X BSCAN 
does not contain the MX and GTLREF1 bits. Future revisions may have dit- 
ferent BSCAN chains. 


А 
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Table 21—6.MXCC JTAG Boundary Scan Bit Order on XBus 
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Table 21–6. MXCC JTAG Boundary Scan Bit Order on XBus (Continued) 
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Note: 
Table 21-6 reflects MXCC Rev. 2.X BSCAN chain. MXCC Rev. 1.X BSCAN 


does not contain the MX and GTLREF1 bits. Future revisions may have dif- 
ferent BSCAN chains. 
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21.7 System-Level Test 


The ТЕБЕ 1149.1 Standard Access Port And Boundary Scan Architecture 
Specification section, "Legal Interconnections Of Components Compatible 
With 1149.1" provides guidelines on how to connect TCK, TMS, TDI, and TDO 
for making serial, parallel, and hierarchical configurations for JTAG board level 
sub-systems. This allows a hierarchical JTAG test technique that Super- 
SPARC supports. Similar in ways to how SuperSPARC selects TDR scan 
chains, a second-level JTAG controller (e.g., board-level JTAG busmaster) 
may connect multiple chips on a board to form parallel and serial paths. Fur- 
thermore, a third-level JTAG controller (6.g., backplane-level JTAG busmas- 
ter) may connect multiple second-level JTAG controllers to form a chain. 


Example 21-1 shows one board-level JTAG busmaster controlling six JTAG 
components configured into two parallel daisy chains. Each component in the 
chain connects its TDI to the preceding chip's TDO. The first chip in each of 
the parallel daisy chains is tied to a common level-two TDI, and the last chip 
in each chain is wire-ORed to a common level-two TDO. Each of the parallel 
chains receives a unique TMS from the second-level TAP controller. 


Example 21-1. System-Level JTAG Test Hierarchy 
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The SuperSPARC processor (SSP) uses the JTAG 1149.1 serial scan inter- 
face to allow access to scan-based debug features. These scan-based debug 
features allow you to debug systems in a non-intrusive manner. 


Topic Page 
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22.1 Scan-Based Debug Support 


The SSP provides facilities to observe and control processor execution from 
aremote device using the IEEE 1149.1 JTAG serial scan interface. The JTAG 
interface is described in chapter 21. 


Traditionally, systems debugging methods required very expensive dedicated 
add-on hardware, connected using ribbon cables and fragile connectors, and 
were not generally available when the first processor prototypes were deliv- 
ered (which is when they would have been most useful). Furthermore, the 
length of the cable limited the processor system clock rate when the emulator 
was being used. This also introduced extra electrical loading, which affected 
pin timings. 


As a solution to that problem, SuperSPARC provides scan-based debug logic 
to support a remote environment for debugging systems, done entirely over 
the serial JTAG bus. The features are useful for both hardware and software 
development. it is completely non-intrusive into the system design. Neither the 
pin timings nor the processor speed is affected. Nearly all features of tradition- 
al emulators are provided, except for real-time trace and memory emulation. 


All programmer-visible state is accessible and changeable using the JTAG in- 
terface. Software must be provided to contro! the scan-based debug from a 
remote computer with a JTAG interface. During scan-based debug, the 
caches and store buffer continue to operate; these resources will snoop in- 
coming system bus requests. Many scan-based debug resources are shared 
with standard software-debugging features. 


Scan-Based Debug Strategy 


SuperSPARC provides scan-based debug logic to aid in system debug and 
failure analysis. The scan-based debugger's (SDB) interface to SuperSPARC 
uses the JTAG instruction register (IR) to select one of four scan-based debug 
JTAG Test Data Registers (TDR). The SDB provides SuperSPARC with proto- 
col commands, SDB instructions, addresses, and data for updating Super- 
SPARC state through the Scan Command and Instruction (MCI) TDR and 
Scan Data In (MDIN) TDR. SuperSPARC retums existing system state data 
and protocol status through two other JTAG TDRs: Scan Data Out (MDOUT) 
and Scan STATus (MSTAT). 


Through these registers, the SDB can command the processor to temporarily 
halt execution of the normal SPARC instruction stream. Once halted, the pro- 
cessor can be directed to execute any normal SPARC instruction. No proces- 
sor state information not explicitly modified by this scan-based debug will be 
altered. 


Scan-Based Debug 
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22.2 The Scan-Based Debugger 


The SDB allowsus to run scan-debug primitives on the SSP. The SDB consists 
of hardware and software along with a user interface to allow us to enter scan- 
based debug mode and scan instructions into the MCI register and check the 
status of the instruction execution. 


The hardware portion of the SDB consists of a host system, a JTAG controller 
card, and a cable to carry the JTAG signals to the target system. The JTAG 
controller card must be capable of receiving commands from the host system 
and converting them into appropriate serial JTAG signals. It must also take the 
serial JTAG signals from the target system under test and convert them into 
a format that the host system can understand. See Figure 22-1 for the system 
hardware configuration. 


Figure 22-1. SDB Hardware Configuration 





JTAG Controller Card 


The SDB software should, as a minimum, contain a driver routine to drive the 
JTAG controller card and higher-level software to allow you to easily write pro- 
grams to run on the target system while in scan-based debug mode. 
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22.3 Scan Registers 


There are five JTAG TDRs that are important for scan-based debug; they are 
listed in Table 22-1. 


Table 22~1.JTAG TDR Scan Hegisters 


лата | — № — _ 
Instruction Register 


E 
[ма _ | Scan Command and пага 
[WON "[Sanban OO 


See chapter 21 for details on JTAG TDR scan operation. 










22.3.1 Instruction Register (IR) 


The IR register selects which JTAG TDR scan chain to access. The following 
table only lists IR encoding that selects scan register. 


Table 22-2.Scan Register Selection 













[Register Selected | тање | ons 
[wow | oe [зм — 
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22.3.2 Scan Data In (MDIN) 


The MDIN registeris a 32-bit register that allows information to be passed from 
the scan controller to the SSP. The format of the МОМ register is shown in 
Table 22-3. 


Table 22-3.MDIN Register Format 


31 0 
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Data is scanned into the MDIN register from the SDB. An instruction sequence 
can move this data from the JTAG MDIN register into the SSP integer register 
file using a load instruction and a SPARC address space identifier (ASI) ac- 
cess. The ASlis an eight-bit value thatis appended to the address of a memory 
access. The purpose of the ASI is to identify special modes and address 
spaces. The ASI for access to the МОМ register is 0x44. The specific SPARC 
instruction that will perform this operation is: 


LDA [9590] 0x44 , %reg 
where %reg is an integer register. Subsequent scan debug instructions can 
use this data to update memory or processor state. 
22.3.3 Scan Command and Instruction (MCI) 
The MCI scan chain is 37 bits and comprises two fields: a five-bit scan com- 


mand register (MCMD) and а 32-bit scan instruction register (MINST). The for- 
mat of the MCI scan register is shown in Table 22-4. 


Table 22-4. Scan Command and instruction Register (МС!) 


MCMD Scan Command Register. Its component fields are 
shown in Table 22-5. 


MINST Scan Instruction. This register contains a single 
SPARC instruction that is executed as the scan in- 
struction (qualified by MEXEC). Several scan in- 
structions are typically required to completely 
execute the semantics of a scan primitive. 


The MCMD field contains the information that qualifies the 32-bit SPARC 
instruction in the MINST field. All bits in the MCMD field are cleared when the 
JTAG TAP controller resets. 

Table 22-5. Scan Command Register (MCMD) 


Lim ____мемтен T МЕС T мет T weeser 
35 34 33 32 


36 
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Soan ае аи 


INITM 


MENTER 


MEXEC 


Enable Scan-Based Debug. The primary function of 
this bitisto enable SuperSPARC to enter scan-based 
debug as a result of a breakpoint, if the ACTION reg- 
ister is properly programmed. In addition, this bit af- 
fects the operation of signal scan-based debug 
(SIGM), а SuperSPARC-specific instruction. If IN- 
ITM=0, SIGM will execute as a NOP. If INITM=1, 
SIGM initiates a user-level scan-based debug mode 
entry. This bit is cleared on JTAG TAP controller re- 
set. 


Enter scan-based debug mode. When set, Super- 
SPARC is forced to enter scan-based debug mode. 
The program counter (PC/NPC pair) is captured to 
resume execution after scan-based debug exit. In 
scan-based debug mode, the SuperSPARC prefetch 
controller stops accessing the instruction cache, 
starts passing NOPs into the IU pipeline, and ex- 
amines the state of MEXEC to wait for an instruction 
to execute. This bitis cleared on JTAG TAP controller 
reset. 


Execute MINST. When set, a single instance of the 
instruction in the MINST register will be forced into 
the processor pipeline. This will cause it to be ex- 
ecuted as a normal SPARC instruction. Once 
launched, the prefetch controller will clear MEXEC, 
resume passing NOPs into the IU pipeline, and moni- 
tor valid bits at the last stage of the IU pipeline. Once 
the prefetch controller determines that no remaining 
instructions are in the processor pipeline, it examines 
MEXIT to determine whether to remain in scan- 
based debug mode or to resume normal execution. 
The MEXEC bit is cleared on JTAG TAP controller re- 
set. 


Scan-Based Debug 
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MEXIT Exit scan-based debug mode. When set, Super- 
SPARC will exit scan-based debug mode (to resume 
normal execution) as soon as all execution in the 
pipeline is complete. The execution stream branches 
to the PC/NPC values stored on entry to scan-based 
debug mode. The prefetch controller continues to 
pass NOP into the pipeline until either an MEXEC or 
MEXIT is asserted. This bit is cleared on JTAG TAP 
controller reset. 


MRESET Hardware Reset. When set, this bit forces a full Su- 
perSPARC hardware reset (rather than a watchdog 
reset). This bit is cleared on JTAG TAP controller re- 
set. 


22.3.4 Emulation Data Out (MDOUT) 


The MDOUT register stores data to be passed back to the scan controller card. 
The format of the MDOUT register is shown in Table 22-6. 


Table 22-6. Scan Data Out Register (MDOUT) 


31 0 


An instruction sequence can move outgoing processor state data from the in- 
teger register file to the MDOUT register by storing a word to the 32-bit ASI 
space for the MDOUT register. The special ASI to access MDOUT is 0x46. The 
specific instruction that performs this operation is: 


STA %reg, [%00] 0x46 


where %reg is a SPARC integer register. The SDB can then scan this data out 
of MDOUT and display the information or status. 
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22.3.5 Scan Status (MSTAT) 


The MSTAT scan chainis 13 bits long, andit contains information about Super- 
SPARC scan debug status. The only way to retrieve this information is through 
JTAG scan operation. The MSTAT register needs to be polled after every 
instruction to ensure that the instruction completed execution and that no error 
occurred. The MSTAT register is cleared upon TAP Controller reset. The 
MSTAT format is: 

| ECHOTMR |  MACK | TMRM | CBKM | 2см | DBKM | 2ссм | 

11 10 9 в. 7 6 


12 


раб __| ERRMODE [ мять | PFPX | MIDONE T FOE — 
4 3 2 1 0 


5 


ECHOTMR Echoed MCMD.MENTER. ECHOTMR is an echoed 
version of MCMD.MENTER after it has passed 
through TCK-VCLK-TCK synchronization. (TCK is 
test clock, VOLK is SuperSPARC clock). ECHOTMR 
is asserted asynchronously to scan instruction 
execution. The purpose of this signal is to signal that. 
SuperSPARC has seen the request to enter scan- 
based mode. Assertion of this signal does not indi- 
cate that SuperSPARC has actually entered scan- 
based debug mode. This bit is cleared upon the JTAG 
TAP Controller reset. 


MACK Scan-Based Debug Acknowledge. MACK is an indi- 
cation that SuperSPARC is in scan-based debug 
mode. It is asserted as soon as SuperSPARC enters 
scan-based debug mode and stays active until Su- 
perSPARC leaves scan-based debug mode. It is syn- 
chronized to TCK (refer to chapter 21 - JTAG). This 
bit is cleared upon TAP controller reset. 


p—!A!—— ———X——— """—————————r 
Note: 


TMRM, CBKM, ZICM, DBKM, and ZCCM quality MACK to identify to the 
SDB the cause for SuperSPARC’s entry into scan-based debug mode. It is 
possible that more than one is asserted, indicating that there was more than 
one cause. 
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TMRM 


CBKM 


ZICM 


ZCCM 


IPND 





Entered scan-based debug mode from MENTER re- 
quest. TMRM indicates that SuperSPARC entered 
scan-based debug mode due to an MCMD.MENTER 
request from the SDB. This bit is cleared on JTAG 
TAP controller reset or SuperSPARC hardware re- 
set, or by updating the MCI register. 


Code Address Breakpoint. CBKM indicates that Su- 
perSPARC entered scan-based debug mode due to 
a code address breakpoint. (See Chapter 15.) This 
bit is cleared on JTAG TAP controller reset, Super- 
SPARC hardware reset, or by updating the MC! reg- 
ister. 


Zero Instruction Count Breakpoint. ZICM indicates 
that SuperSPARC entered scan-based debug mode 
due to a zero-instruction-count breakpoint. (See 
Chapter 15.) This bitis cleared on JTAG TAP Control- 
lerreset, SuperSPARC reset, or by updating the MCI 
register. 


Data Address Breakpoint. DBKM indicates that Su- 
perSPARC entered scan-based debug mode due to 
a data-address breakpoint. (See Chapter 15.) This 
bit is cleared on JTAG TAP controller reset, Super- 
SPARC reset, or by updating the MCI register. 


Zero Cycle Count. ZCCM indicates that Super- 
SPARC entered scan-based debug mode due to a 
zero cycle count breakpoint. (See Chapter 15.) This 
bit is cleared upon a JTAG TAP controller reset, Su- 
perSPARC reset, or by updating the MCI register. 


Pending Interrupt. IPND assertion indicates that the 
processor has a pending interrupt request that is 
higher than the current SuperSPARC PSR.IPL or at 
level 15. IPND is used to inform the SDB that an inter- 
rupt is pending and that SuperSPARC should be re- 
leased to service it. This bit is cleared upon the JTAG 
TAP controller reset. 
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ERRMODE 


Error Mode. ERRMODE indicates SuperSPARC has 
entered error mode as a result of a fault generated 
while in scan-based debug mode. Any exception oc- 
curring during а scan-based debug instruction se- 
quence will force SuperSPARC into error mode. En- 
tering error mode will induce the watchdog reset se- 
quence, set MSTAT.ERRMODE, and exit scan- 
based debug mode. The ERRMODE bit will stay as- 
serted until the next MSTAT update operation or is 
cleared by the JTAG TAP controller reset. 


pM———— 


Note: 


When error mode occurs while in scan-based debug mode, no assertion of 
MEXIT or MRESET is needed for SuperSPARC to leave scan-based debug 
mode and restart execution at the reset vector. 


————————————————————————————— | 


MIFLTD 


PFPX 


Data Access Fault. MIFLTD indicates that a scan- 
based debug instruction with a data memory refer- 
ence created a data access fault. It does not indicate 
whether other sources of exceptions (e.g., pending 
FP exceptions) occurred during scan-based debug 
mode instruction execution. MIFLTD is meaningful 
only when qualified by the assertion of MIDONE sta- 
tus bit. There is no scan-based debug instruction ex- 
ception update to MFSR or MFAR. Only MSFSR (the 
shadow FSR) will be updated. MSFSR is cleared on 
entrance to scan-based debug mode (see Chapter 
10 for more details). The SDB service processor 
must check the MIFLTD status bit after every scan- 
based debug instruction that references memory, 
and, if that bit is set, itis the responsibility of the SDB 
to clear it with an explicit STA instruction to the 
MSFSR, a JTAG TAP controller reset, by updating 
MCI, or by a SuperSPARC reset. 


Pending Floating-Point Exception. PFPX indicates 
that a pending floating-point exception exists. This 
exception can be caused in two ways: prior non- 
scan-based debug FPOPs (which were already in 
the FQ and continued to execute while the processor 
was in scan-based debug mode) generate an FP 
exception. Scan-based debug FPOPs can also gen- 
erate an FP exception. 
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Note: 





Before issuing a floating-point related scan-based debug instruction, the 
SDB must make sure that all FPOPs have cleared from the FQ (MSTAT.FQE 
= 1) and that there is no pending FP exception generated (PFPX = 0). 


Any taken floating-point exception in scan-based debug mode will cause er- 
ror mode. The PFPX bit is cleared by clearing out the floating-point queue 
(FQ), by a SuperSPARC hardware reset, or by a JTAG TAP controller reset. 





MIDONE 


FQE 


Scan-Based Debug Instruction Completed. MIDONE 
is asserted when a scan-based debug instruction 
completes execution. No additional scan-based de- 
bug instructions should be issued until the current 
scan-based debug instruction has completed, as in- 
dicated by MIDONE. If а scan-based debug instruc- 
tion is an FPOP, MIDONE assertion only means that 
the FPOP was successfully issued to the FPU. 
Therefore, for floating-point-related scan-based de- 
bug instructions, the requirements to satisfy before 
issuing another FP-related scan-based debug in- 
struction is error-free execution of the previous 
FPOP, which is indicated when FQE = 1 and PFPX 
= 0. This MIDONE bitis cleared by JTAG TAP control- 
ler reset, reading MSTAT, subsequent entry into 
scan-based debug mode, or SuperSPARC hardware 
reset. 


FQ Empty. FQE indicates that the floating-point 
queue (FQ) is empty and is asserted when all FPOPs 
have finished error-free execution. 
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22.4 Scan-Based Debug Registers in ASI Space 


There are four ASI-mapped registers that directly support scan-based debug: 
MPC/MNPC, MTMP1/ MTMP2, MDIN, and MDOUT. The following subsec- 
tions briefly describe each of those registers. 


22.4.1 Scan-Based Debug Exit PC/nPC Registers (MPC/MNPC) 


= Т наш [Access [ Sas | 
Scan base debug Ex PC 
Scan based debug Ext NPG | LUST | singe — 


The program counter of the instruction executing when scan-based debug 
mode entry occurs is written to PC/nPC registers. These values are the targets 
of the branch when SuperSPARC resumes normal execution upon exit from 
scan-based debug mode. These registers can be examined or altered by the 
SDB and are accessible through LDA/STA 0x47-0x48. 









22.4.2 Scan Temporary Registers (MTMP1/MTMP2) 


лы Foncion [дош [ эш | 
[эю — [Saemer[iusr [ewe — 
C аттат | OST [эпе — 


The MTMP1 and MTMP2 registers are useful for temporarily storing informa- 
tion while in scan-based debug mode. If more than two words of information 
need to be temporarily stored, the SDB must use its JTAG MDOUT scan capa- 
bility and later restore it through JTAG МОМ, The MTMP1 and МТМР2 regis- 
ters are accessible through ASI 0x40 and 0x41. 






22.4.3 Scan Data In (MDIN) Register 


ка | Fanion [Access | Sus | 
ЗИ зап" [Ш | sme - 


А scan-based debug instruction sequence can move incoming scan-based 
debug data from the JTAG MDIN register intothe integer unit register file using 
an LDA instruction. Subsequent scan-based debug instructions can use this 
data to update memory or SuperSPARC state. The MDIN register is a read- 
only register through an ASI 0x44 access. See Subsection 22.3.2 for more de- 
tails on MDIN. 
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22.4.4 Scan Data Out (MDOUT) Register 


ка" | Funcion — p Access [ Sue — 
[eas |  Sandsaos | Sr | эте | 


A scan-based debug instruction sequence can move outgoing SuperSPARC 
state data from the integer unit register file to the scan-based debug MDOUT 
register by using an STA. The SDB can then use this information to display pro- 
cessor state or check status. This register is accessible through ASI 0x46. See 
Subsection 22.3.4 for more details on MDOUT. 
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22.5 Entering Scan-Based Debug Mode 


Scan-based debug mode can be entered through any one of six different meth- 
ods: 


[] Set MCMD.MENTER to force scan-based debug mode. 

(а Execute SIGM with MIMD.INTM set. 

С] Through Code Address Breakpoint. 

(] Through Data Address Breakpoint. 

(3 Through Instruction Counter Breakpoint 

О Through Cycle Counter Breakpoint. 

The first is used by the SDB to unconditionally present a scan-based debug 
mode request by using the JTAG tap controller to assert MCMD.MENTER. 
When the MCMD.MENTER bit is set, the processor will enter scan-based de- 
bug mode on the next SPARC instruction. The other methods require ASI reg- 
isters to be configured to conditionally generate a scan-based debug mode 
request. SIGM (a special SuperSPARC instruction) also allows scan-based 


debug mode entry. See Table 15-1 for further description on how to set up the 
breakpoints. 
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22.6 Scan-Based Debug Operation 


22.6.1 Scan-Based Debug Execution Details 


This subsection provides some additional information on how SuperSPARC 
operates in scan-based debug mode. 


22.6.1.1 State During Scan-Based Debug Mode 


On entry to scan-based debug mode, a store buffer copyout is initiated. The 
MONTL.SB bit is set to 0, thereby turning off the store buffer. All store instruc- 
tions are synchronous and bypass the store buffer. In scan-based debug 
mode, all instructions execute in supervisor mode, and multiple instruction 
execution is disabled. The PSR.CWP, PSR.PS, and TBR.TT remain atthe val- 
ues they were before scan-based debug mode entry. The MCNTL.NF bit is as- 
serted, and the PSR.ET bitis cleared. The MFSR and MFAR registers remain 
unaffected by scan-based debug faults. The fault information is written into the 
shadow MFSR and MFAR registers reserved exclusively for scan-based de- 
bug mode. The FPU remains enabled and will continue instruction execution 
without being aware that the processor has entered scan-based debug mode. 


Table 22-7 lists the possible states that the SSP can bein while in scan-based 
debug mode. 


Table 22-7. SuperSPARC State in Scan-Based Debug Mode 


Register/Bit Atfected 
аҥ  asered 
Ст ОНИ 8 


ACTION.MIX negated 
NONTLNF 


MCNTL.SB negated 





$ 


8 
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22.6.1.2 Exceptions During Scan-Based Debug instruction Execution 


The SSP executes instructions with PSR.ET disabled during scan-based де- 
bug mode. Any synchronous exceptions that are reported will cause the SSP 
to enter error mode and set MSTAT.ERRMODE, which will induce a watchdog 
reset. The fact that PSR.ET was disabled on entry to scan-based debug mode 
allows asynchronous exceptions, such as priority interrupts, data store ex- 
ception, and floating. point. exception to be ignored. 


If a scan-based debug data memory reference instruction faults, it sets the 
MSTAT.MIFLTD bit but does not enter error mode. The shadow FSR is set with 
the appropriate values to describe the cause of the fault. The SDB needs to 
read this register when the MSTAT.MIFLTD bit is set to determine the cause 
of the fault and also to clear out the shadow FSR. Failure to do this will leave 
MSTAT.MIFLTD asserted and can give subsequent scan-based debug in- 
structions a faulty status indication. 


MSTAT.IPND informs the SDB that a priority interrupt is being deferred while 
in scan-based debug mode. The SDB will need to determine whether to exit 
scan-based debug mode to allow the processor to deal with the priority inter- 
rupt. 


MSTAT.PFPX informs the SDB that a priority interrupt is being deferred while _ 
in Scan-based debug mode. Floating-point exceptions during scan-based de- 
bug mode are of particular concem, requiring elaborate trap handlers. Care 
must be taken to examine all the signals indicating error-free condition before 
issuing a floating-point-related instruction. See Section 22.3. 


22.6.1.3 Legal and Illegal Scan-Based Debug Instructions 


The scan-based debug instruction in MINST must be a legal SPARC instruc- 
tion. Only asubset of the SPARC instruction set, however, is supported during 
scan-based debug mode. In general, instructions that affect the flow of execu- 
tion when notin scan-based debug mode are not supported. A simple example 
is a branch instruction. There is no reason to support control transfer instruc- 
tions in scan-based debug mode. Operation of the processor in scan-based 
debug mode on these illegal instructions is undefined. 


Legal scan-based debug instructions: 


С] All legal memory reference instructions (including ASI accesses and 
atomic accesses). 


С All arithmetic and logical instructions, except trapping tagged arithmetic. 
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О SETHI. 

О All integer state register accesses. 

llega! scan-based debug instructions (but legal SPARC instructions): 

С All control transfer instructions (CALL, Bicc, FBfcc, JUMPL, and RETT). 
Q All software traps (Ticc). 

О The FLUSH instruction. 

С] Al trapping tagged arithmetic. 

[] SAVE and RESTORE (manipulate CWP directly instead). 

[] Allillegal instructions. 

Many of these illegal operations will cause entry into error mode, force Super- 


SPARC to leave scan-based debug mode, and induce a non-scan-based de- 
bug watchdog reset. 


22.6.1.4 Compound Scan-Based Debug Protocol Commands 


Separate contro! bits are used to enter scan-based debug mode, execute a 
scan-based debug instruction, and then exit. These separate bits may be used 
together to optimize scan-based debug sequences. The simplest scan-based 
debug sequence writes the MCI.MINST register with a single scan-based de- 
bug instruction and sets the MENTER, MEXEC, and MEXIT bits. This will 
cause SuperSPARC to enter scan-based debug mode, execute the scan- 
based debug instruction, and then resume execution. This entire sequence 
requires only afew cycles to execute after MCI is scanned in through the JTAG 
interface. Table 22-8 describes the valid compound scan-based debug se- 
quences. 
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Table 22-8. Valid Compound Scan-Based Debug Mode Sequences 
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Enter scan-based debug mode, issue a command, 
then pause awaiting another MEXEC or MEXIT 
Once in scan-based debug mode, issue a new 
scan-based debug mode instruction, then pause 
awaiting another MEXEC or MEXIT. 
debug mode upon completion, then resume execu- 
tion at the captured (or altered) non-scan-based 
| x | 0o | 0 | Thisisan NOP in scan-based debug mode. 
Will cause entry into scan-based debug mode, and 
wait. 
struction is executed. MSTAT.MACK is not up- 
dated. To the programmer it might appear that 
races in sampling MSTAT.MIDONE that cause unfavorable interferences from 
the SDB. On completion of the last scan-based debug instruction in the current 


Once in scan-based debug mode, issue a new 

debug mode PC pair. 

Will start entry into scan-based debug mode, but 

scan-based debug mode was never entered. (This 
scan-based debug session, this compound MCMD mode will exit scan-based 


мкм | тшт 

scan-based debug instruction, exit scan-based 
Once in scan-based debug mode, immediately 
resume execution at the captured (or altered) non- 
scan-based debug mode PC pair. 
then immediately exits. No scan-based debug in- 
sequence is not very useful.) 

Simultaneous assertion of MEXEC and MEXIT can lead to hazardous timing 

debug mode. MSTAT.MIDONE will be set, and MSTAT.MACK will be negated. 


The SDB must check MSTAT to make sure the previous scan-based debug 
instruction has completed and was error-free. A re-entry into scan-based de- 
bug mode clears out MSTAT.MIDONE and MSTAT.MIFLTD, and it could do so 
before the SDB could check MSTAT. in this case, the SDB would mistakenly 
assume that the last scan-based debug instruction had not completed. 


It is recommended that an MEXEC be issued without MEXIT. An immediate 
re-entry into scan-based debug mode will allow the scan-based debug pro- 
gram to differentiate between the end of the first scan-based debug session 
and the start of the second scan-based debug session. When the scan-based 
debug instruction completes with no faults, an MEXIT can be issued. 
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22.6.2 Scan-Based Debug Sequences 


Issuing each scan-based debug instruction requires the SDB to send multiple 
JTAG scan sequences. Since the MCI register contains both the MINST and 
МСМО registers, only one register needs to be loaded to issue a single scan- 
based debug instruction. Depending on the scan-based debug instruction to 
execute, MDIN may need to be set beforethe instruction is executed. For more 
complex scan-based debug sequences, processor state will need to be pre- 
served before any state is modified. This involves many scan-based debug in- 
structions. 


Unless SuperSPARC is in error mode, the SDB can force SuperSPARC into 
scan-based debug mode by scanning in an asserted MCMD.MENTER value. 
The SDB polls MSTAT for an indication that the instruction is complete. The 
scan sequence for emulating an individual instruction is at least a portion of 
the following: 


Scan-In any required pointers and data into MDIN. 

Scan-In the scan-based debug instruction (MCI MINST) and 
scan-based debug protocol command (MCMD), 
including the EXEC and optionally the MEXIT bits. 

Scan-Out MSTAT scan-based debug status register to 
determine when the scan-based debug instruction 
has completed, faulted, or induced error mode. This 
poll also indicates whether any prioritized interrupt 
is currently at SuperSPARC's pins. 

Scan-Out of any requested SuperSPARC system state data 
from MDOUT. 


22.6.3 Scan-Based Debug Instruction Sequences for Common SDB Functions 


This subsection will describe some possible scan-based debug instruction se- 
quences for some common primitives. Many other implementations are possi- 
ble. The primitives may be combined to create other functions as needed. 
The primitives to be described are: 

С] Read/Write integer Registers. 

О Read/Write Integer Control Registers (PSR, WIM, TBR, and Y). 

С) Read/Write Floating-Point Registers. 

О Read/Write Floating-Point Control Registers. 

[] Read/Write Memory (byte, half, word, double, etc.). 

[] Read/Write Memory (Normal and ASI). 

о Set Code and Data Address Breakpoints. 

[Г] Single-Step. 
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С Run for N Cycles. 
О Run until Breakpoint reached. 


Thenextsubsections provide detailed sequences for each of the above opera- 
tions. Throughout these sequences, numerous symbolic constants will be 
used, which are described in Table 22-9. All sequences assume that the pro- 
cessor has entered scan-based debug mode without fault. 


Table 22-9. Symbolic Constants for Scan-Based Debug Sequences 






0x300 










Scan-based debug register 
Scan-based debug memory addr 


22.6.3.1 Integer Register File Read 


The most basic SDB operation is to read an integer register. The following 
scan-based debug instruction sequence will transfer the contents of the inte- 
ger register XREG within the Current Window Pointer (CWP) to MDOUT regis- 
ter. Once in MDOUT, the data can be scanned out to the SDB. (See 
Example 22-1.) 


Example 22-1. Integer Register File Read 


// copy XREG to mdout; scan it out. 
scan in("sta %ХВЕС, [%g0] mdout", mci, poll) 
scan out(mdout, SDB) 
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22.6.3.2 Integer Register Flle Write 


To store a new value (or restore an old value) into an integer register at the 
CWP, the sequence in Example 22-2 can be used. 


Example 22-2. integer Register File Write 
// Scan in new value; install it into integer register. 


scan in(new integer register file value,mdin) 
scan in("lda [590] mdin, $XREG", mci, poll) 


22.6.3.3 Integer State Register Read 


The sequence in Example 22-3 will transfer any of the integer state registers 
(PSR ,WIM, TBR, and Y) into the MDOUT register. The sequence in 
Example 22-3 assumes that access to the PSR is desired. Note the use of the 
MTMP1 register to preserve and restore integer register state. 


Example 22-3. Integer State Register Read 
// prologue 
scan in("sta 501, [%90] mtmp1”, mci, poll) 
// copy PSR to mdout through gl scan it out. 
scan in("rd  $psr, %91", mci, poll) 
scan in("sta %91, (*g0] mdout", mci, poll) 
scan out(mdout, SDB) 
// epilogue 
Scan in("lda [%g0] mtmpl, %91", mci, poll) 


22.6.3.4 Integer State Register Write 


The sequence in Example 22-4 will modify any integer state register. The se- 
quence in Example 22-4 modifies the PSR as an example. 


Example 22-4. Integer State Register Write 
// prologue 
scan in("sta %91, [%90] mtmpl", mci, poll) 
// Scan in new value to gl and install into psr. 
Scan in(new integer register file value,mdin) 
scan in("lda [$90] mdin, %91”, mci, poll) 
Scan in("wr  $gl, $psr", mci, poll) 
/ /epilogue 
scan in("lda [590] mtmpl, %g1”, mci, poll) 


22.6.3.5 Memory Read 
The sequence in Example 22-5 demonstrates reading a signed byte value at 
a given memory address XADDR within an implicit altemate address space 


(supervisor data space where ASIzOxb). This ASI is used because the proces- 
sor is effectively executing in supervisor mode. 
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Example 22-5. Memory Read 


//prologue 

Scan in("sta %91, [%90] mtmpl", mci, poll) 
scan in("sta %g2, [%90] mtmp2", mci, poll); 
// Scan in xaddr. copy to gl 

scan in(new xaddr value,mdin) 

scan in("lda [£$g0) тіп, $gl", mci, poll) 
// load g2 w/ *xaddr; copy to mdout. Scan it out. 
scan in("ldsb [%91], %g2”, mci, poll) 

Scan in("sta %g2, [*g0] mdout", mci, poll) 
scan out(mdout, 508) 

//epilogue 

scan in("lda [%g0] mtmpl, 591”, mci, poll) 
Scan in("lda [$g0] mtmp2, $g2", mci, poll); 


At the conclusion of the sequence, the value of the signed byte residing in 
memory address XADDR within the implicit supervisor data space AS! (Oxb) 
will be placed in the MDOUT register. When MSTAT.MIDONE is asserted, the 
data can be scanned out to the SDB. 


The sequences for signed and unsigned half-word and word and double-word 
reads are similar. Reads from alternate address spaces are also similar. 


22.6.3.6 Write Memory 


The sequence in Example 22-6 modifies memory by writing a byte value at 
а given memory address XADDR within an implicit altemate address space 
(supervisor data space where ASIsOxb). 


Example 22—6. Write Memory 


// prologue 

всап in("sta $gl, [%90] mtmp1”, mci, poll) 

Scan in("sta %g2, [$g0] mtmp2”, mci, poll); 

// Scan in XADDR value. copy \s-2XADDR\s0 to gl. 
зсап in(XADDR VALUE, mdin) 

scan in("lda [490] mdin, $gl",mci, poll) 

// Scan in NEW DATA value. copy NEW DATA to g2. 
Scan in(NEW DATA VALUE, mdin) 

scan in("lda [%g1] mdin, %92”,mci, poll) 

// install NEW РАТА at XADDR. 

Scan in("stb %g2, [$gl1]",mci, poll)// epilogue 
scan in("lda (%90] mtmpl, $gl", mci, poll) 

Scan in("lda [%40] mtmp2, %g2”, mci, poll) 


During a memory write, MDIN will be used twice by the SDB. Initially the SDB 
will scan into MDIN the value of the memory address XADDR to be written. 
Once this address is transferred to the integer register file, the SDB will then 
scan in the new value (NEW DATA, VALUE) to be written at the specified 
memory location (XADDR). 
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The sequences for half-word, word, and double-word writes are very similar. 
Writes to alternate spaces are also similar, with the STB instruction replaced 
by an STBA. 


22.6.3.7 Floating-Point Register Read 


For simplicity, Example 22-7 provides the same address in register MDIN; its 
selection is assumed to be safe. The sequence in Example 22-7 will transfer 
a floating-point register XREG into the MDOUT register for the SDB to scan 
out. The sequence for reading floating-point control registers is similar. 


Example 22-7. Floating-Point Register Read 


// prologue 

Scan in("sta 801, [%90] mtmpl", mci, poll) 

Scan in("sta %g2, [%g0] mtmp2", mci, poll) 

// Scan address of background memory word (XADDR). 
// copy XADDR into $gl; load *xaddr into g2. 

Scan in(xaddr value, mdin); 

scan in("lda [%90] mdin, %g1”,mci, poll) 

scan in("ld [%gl], %92", mci, poll) 

// this scan-based debug primitive MUST not change non- 
// scan-based debug 

// memory state. FP reads require we alter and then 
// restore a background memory state. No sufficient number 
// of mtmp to do this so we augment the storage by 
// scanning out to SDB memory. 

Scan in("sta %492, mdout",mci, poll) 

scan out(mdout, SDB templ) 

// before issuing next SDB instruction, 

// make sure mstat.fqe-1 and mstat.pfpx=0 

// copy fp entry to bgnd word; copy into iu temp. 
scan in("st  ФЕХКЕС, [%g1]”,mci, poll) 

Scan in("ld [%91], $g2",mci, poll) 


// copy FP entry to mdout; scan out fp entry. 
scan in("sta %g2, [%90] mdout",mci, poll) 
Scan out(mdout, SDB) 


// restore original background memory word 
Scan in(SDB templ,mdin) 

scan in("lda [550] mdin, %92", mci, poll) 
Scan in("st %g2, [%g1]”,mci, poll) 

// epilogue 

Scan in("lda [$g0] mtmpl, %g1”, mci, poll) 
Scan іп(”1да [$g0] mtmp2, %92", mci, poll) 


22.6.3.8 Floating-Point Register Write 


The sequence for writing floating-point registers is similar to the reading se- 
quence, except that a floating-point register is loaded from memory rather than 
written. Example 22-8 is a possible code sequence. 
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Example 22-8. Floating-Point Register Write 


// prologue 
Scan in("sta %gl, [%90] mtmpl”, mci, poll) 
Scan in("sta %g2, [$g0] mtmp2", mci, poll) 


// Scan address of background memory word (XADDR). 
// copy XADDR into %91; load *xaddr into 92. 

Scan in(XADDR VALUE, mdin) 

Scan in("lda [$g0] mdin, $gl",mci, poll) 


scan in("ld [%gl], %g2”, mci, poll) 


// preserve original background word in 

// SDB templ. This scan-based debug primitive 

// must not change non-scan-based debug memory state. FP 
// writes require we alter and then restore 

// а background memory state. No sufficient number 

// of mtmp to do this so we augment the storage by 

// scanning out to SDB memory. 

scan in("sta %g2, mdout",mci, poll) 

scan out(mdout, SDB templ) 


// Scan in new new fp data value 

// load it into iu rfile. write it to background word. 
scan in(NEW FP DATA VALUE,mdin) 

scan in("lda [$g0] mdin, £g2",mci, poll) 

scan in("st 492, [£gl)",mci, poll) 


// before issuing next scan-based debug instruction, 
// make sure mstat.fqe=1 and mstat.pfpx=0 

// install new value into FP register. 

scan in("ld [251], *fXREG",mci, poll) 


// restore original background memory word. 
scan in(SDB templ,mdin) 

scan in("lda [%90] mdin, %g2”, mci, poll) 

scan in("st  *g2, [%g1]”,mci, poll) 

// epilogue 

scan in("lda [*g0] mtmpl, $gl", mci, poll) 
scan in("lda [%g0] mtmp2, $92", mci, poll) 


22.6.3.9 Floating-Point State Register Read 
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Reading values from floating-point-state registers, such as the FSR, is similar 
to the previous two examples. The floating-point memory references are re- 
placed by store FSR operations. Extracting entries from the floating-point 
queue is more difficult. While in scan-based debug mode, FQ entries should 
never be extracted unless MSTAT.PFPX indicates an FP exception. Other- 
wise, the remaining FPOPs in the FQ should be allowed to execute. If PFPX 
is asserted and the SDB wants to recover, the FQ can be read with a double- 
word size. Since MDOUT is only a single-word wide, two scan passes are re- 
quired to transfer the full queue entry to the SDB. 
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There is an additional architectural side effect of reading the floating-point 
queue. When an entry is read, it is effectively removed from the queue. Since 
scan-based debug is required to be non-intrusive to program execution, the 
old state of the queue must be restored. If the queue state is to remain un- 
changed after being observed, all entries in the queue must be extracted using 
STDFO until it is empty. All these removed entries must then be reinserted (re- 
written) back into the queue. Although there is no simple instruction for writing 
to the FP queue, it may be restored by executing a floating-point instruction 
while in scan-based debug mode. The instruction and PC value are both 
stored in the queue. By writing the PC into the MDIN register and then issuing 
the floating-point instruction as a scan-based debug instruction, the queue will 
be restored. 


22.6.3.10 Floating-Point State Register Write 


Writing to the floating-point state registers is similar to reading them. The same 
restrictions apply. 


22.6.3.11Setting Code and Data Address Breakpoints 
The following sequence will set up a code or data address breakpoint and re- 
sume normal execution. When the breakpoint occurs, the processor will re-en- 
ter scan-based debug mode. 


A string of scan-based debug instruction sequences will be required to write 
the code address breakpoint register set. 


1) Write the desired code or data space breakpoint address value (virtual or 
physical) into the Breakpoint Address Register (BKV). 


2) Write the code or data space breakpoint address compare mask into the 
Breakpoint Mask Register (BKM). 


3) Write the breakpoint control register (BKC) to clear BKC.CBKEN and se- 
lect the desired values for BKC.CSPACE, BKC.PAMD, BKC.CBKEN, 
BKC.DBREN, and BKC.DBWEN. 


4) Write the breakpoint status register (BKS) to clear any prior code break- 
point status. 


5) Write the action on event contro! register (ACTION) so that the desired 
breakpoint event will generate a scan-based debug mode request and 
(optionally) assert a scan-based debug strobe (ESB) pin. See Subsection 
15.2.5 for more details. 


Example 22-9 illustrates such a scan-based debug mode request. 
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Example 22-9. Scan-Based Debug Mode Hequest 


// prologue 

scan in("sta %gl, {%g0] mtmpl", mci, poll) 
scan in("sta %92, [%90) mtmp2", mci, poll) 
scan in("sta %g3, [$g0] mdout", mci, poll) 
scan out(mdout, SDB templ) 


// set gi to ASI cbkv addr offset w/in MDIAG ASI; 
scan in("or %g0, bkv, $gl",mci, poll) 


// Scan-in lower 32-bit (of 36 bits) for next cbkv value 
// install it іп g2 of register pair g[23]. 

Scan in(NEW CBKV VALUE LO32,mdin) . 

scan in("lda [$g0] mdin, $g2",mci, poll) 


// Scan-in upper 4-bit portion (of 36-bits) 

// for next cbkv value 

// install it in g3 of register pair g[23]. 

scan in(NEW CBKV VALUE HI4,mdin) 

scan in("lda [$g0] mdin, %g3”,mci, poll) 

// install register pair g[23] into mdiag cbkv. 
scan in("stda 392, [%91] mdiag",mci, poll) 

// set gi to ASI cbkm addr offset w/in MDIAG ASI; 
scan_in(”or  $g0, bkm, $gl",mci, poll) 


// Scan-in lower 32-bit portion (of 36-bits) 
// for next cbkm value 

// install it in g2 of register pair g[23]. 
Scan іп (МЕН CBKM VALUE LO32,mdin) 

scan in("lda [%90] mdin, %g2”,mci, poll) 
// Scan-in upper 4-bit portion (of 36-bits) 
// for next сока value 

// install it in g3 of register pair g[23]. 
acan in(NEW CBKM VALUE HI4,mdin) 

scan in("lda [*g0] mdin, 5$g3",mci, poll) 
// install g[23] to cbkm 

Scan in("stda %32, [*gl] mdiag",mci, poll) 


// set gl to addr cbkc; 

scan in("or  $g0, bkc, $gli",mci, poll) 

// Scan-in NEW CBKC VALUE. 

// read it into the integer register file; install it into 
// CBKC. 

Scan іп (МЕН CBKC VALUE, mdin) 

всап in("lda [$g0] mdin, %g2”,mci, poll) 
всап in("sta %g2, [%91] mdiag",mci, poll) 
// set 91 to addr cbks in mdiag; clear cbks. 
Scan in("or 590, bks, $gi",mci, poll) 

Scan in("sta %g0, [%91] mdiag",mci, poll) 
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Example 22-9. Scan-Based Debug Mode Request (Continued) 


// scan-in NEW ACTION VALUE. 

Scan in(NEW ACTION VALUE, mdin); 

// xead it into the integer register file; install it into 
// ACTION. 

scan in("lda [$g0] mdin, $g2",mci, poll) 
scan in("sta %g2, [%g0] action",mci, poll) 
// epilogue 

ѕсап іп(”1да [$g0] mtmpl, %g1”, mci, poll) 
Scan in("lda [%g0] mtmp2, $g2", mci, poll) 
Scan in(SDB templ, mdin) 

scan in("lda [$90] mdin, %43", mci, poll) 


Setting a breakpoint on a specific data memory address reference is very simi- 
lar to the above sequence. The only difference is that the value is written into 
memory-mapped BKC and ACTION registers. In the above example, 
BKC.CSPACE, BKC.CBKEN, and АСТІОМ.І СВК are set. А write-only data 
address breakpoint would clear BKC,CSPACE, BKC.DBFEN, BKC.DBREN, 
and ACTION.I ОВК, while BKC.DBWEN would be set. 


22.6.3.12 "“Run-for-N” instructions/Cycles 


Sequences for programming “Run-for-N” instructions and cycles are similar to 
setting code/data breakpoints. (See Example 22-10.) The cycle counter 
breakpoint is most useful for statistically profiling execution of a program. The 
instruction counter breakpoint is useful for single-stepping or block-stepping 
through a program execution. 
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Example 22-10. 
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// prologue 

scan in("sta %gl, [%90} mtmpl", mci, poll) 
scan in("sta %g2, [%90] mtmp2", mci, poll) 
scan in("sta %g3, [$90] mdout”, mci, poll) 
всап out(mdout, SDB templ) 


// set 91 to ASI address for CNTV w/in EDIAG. 
scan in("or %g0, 0x000, %g1”,mci, poll) 


// scan-in NEW CNTV VALUE for ICNT/CCNT. 
// read it into *92; install into CNTV. 
Scan in(NEW CNTV VALUE, mdin); 

scan | in("lda [590] mdin, $g2",mci, poll) 
scan in("sta *g2, [$g1] cntv",mci, poll) 


// clear cnts 
scan in("sta %g0, [$90] cnts”,mci, poll) 


// scan-in NEW ACTION VALUE for ACTION. 

// read it into $g2; install it into ACTION. 
всап in(NEW ACTION VALUE, mdin); 

scan | in("lda [590] mdin, $g2",mci, poll) 
scan in("sta %g2, [%gl] action",mci, poll) 


// scan-in NEW CNTC VALUE for ICNTEN/CCNTEN. 
// xead it into 192; install it into CNTC. 
Scan in(NEW CNTC VALUE, mdin) 

scan | in("lda [5901 mdín, %g2”,mci, poll) 
scan in("sta %g2, [*g1] cntc",mci, poll) 
// epilogue 

Scan in("lda [$90] mtmpl, %g1”, mci, poll) 
всап in("lda [£g0] mtmp2, $g2", mci, poll) 
всал in(SDB templ, mdin) 

scan in("lda [£g0] mdin, $93", mci, poll) 
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22.7 Approximate Latencles for Each SDB Primitive 


Each scan-based debug instruction execution requires several multi-bit JTAG 
TDR scan operations. 


The number of scan-based debug instructions required per primitive is: 
Q One to access an integer register entry. 

Г] Four to access an integer control register. 

O Seven to access a memory location. 

(Д 1310 access a floating-point register. 

(31 13 to set up an instruction (cycle) counter expiration. 

( 21 to set up an instruction (data) address breakpoint. 


n 


e number of МОМ (or MDOUT) scan operations required per SDB primitive 
is: 
(] One scan operation per integer register access. 
С] One МОМ scan operation per floating-point register write. 
О Two (МОМ and MDOUT) scan operations per memory access. 
(С Three МОМ scan operations per setting of a breakpoint or counter event. 
Each scan-based debug instruction scan in and scan-based debug status 


scan out requires about 50 TCK cycles. Each MDIN scan in takes about 40 
ТСК cycles. Each MDOUT scan out takes about 40 ТСК cycles. 
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Proper clocking is essential at high operating frequencies. This chapter will 
describe essential clock requirements for the SuperSPARC processor (SSP) 
and MultiCache Controller (MXCC). 


In order to reduce system clock skew, a phase locked loop (PLL) is employed 
for each of the clock inputs. 
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23.1 Phase Locked Loop Operation 


23-2 


The SuperSPARC chips use PLLs to reduce the skew between the clock in- 
puts and points internal to the chips where the clock signals are used. The re- 
duced internal skews allow the timing specifications to be tighter than they 
would be without the use of the PLLs. Tighter timing specifications simplify the 
design and construction of high-speed systems. 


Each PLL operates by constantly measuring intemal clock routing delay and 
internally generating a clock that is effectively ahead of the extemal clock by 
an amount equal to the internal routing delay. This ensures that internal logic 
sees a clock signal with very low skew from the extemal clock pin. 


Figure 23-1 shows the scheme used for reducing skew. The PLL samples the 
input clock and the clock at the end of a balanced distribution tree. The phase 
comparator in the PLL adjusts the phase of the voltage-controlled oscillator 
(VCO) so that the two docks have the same frequency and phase. 





Note: 


Prior to normal operation, ће PLL must be allowed time to stabilize (i.e., after 
power-up or when PLL has been disabled). During this time, RESET should 
be active. The time required is 100 milliseconds. 





дд 


Note: 


TheJTAG TAP controller must be reset priorto or atthe same time as RESET 
in order for the PLL to begin initialization. The TAP controller may be initial- 
ized either by asserting the TRST pin or by asserting the TMS pin for five con- 
secutive cycles of test clock (TCK). If this reset does not occur, the PLL clock 
feedback loop may not be established, and unpredictable operation may re- 
sult. 





Whenever the JTAG interface is not in use by a particular system, asserting 
the ТАЗТ signal statically is strongly recommended. 


The input clock should never be stopped or changed from its normal periodic 
operation while the PLL is enabled. Doing so will cause PLL instability and un- 
predictable operation. If the clock is changed from its normal regular pattern, 
the change must occur only while RESET is asserted. RESET must remain 
asserted for the stabilization period of at least 100 ms after the clock resumes 
regular periodic operation, regardless of whether the frequency is changed. 


Clocking 
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Figure 23-1. Clock Distribution and the PLL 
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23.1.1 SuperSPARC 


To guarantee PLL stabilization, RESET should be active for at least 100ms af- 
ter power and clock become stable. 


The operation of the PLL circuit can be disrupted by noise in its power supply. 


To ensure proper operation of the PLL clock, system noise should be filtered 
out of Усссик and Vssci«. Figure 23-2 shows a recommended filter circuit. 


Subject to Change Without Notice 


Phase Locked Loop Operation 





Figure 23-2. Typical Phase Locked Loop (PLL) Filter Circuit for SuperSPARC Processor 


23.1.2 MXCC 
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SuperSPARC 





PLL relies on an external loop filter capacitor to integrate phase comparisons. 
An extemal capacitor must be connected between PLLRC and ground. The 
recommended value is 0.141 F. 


The MXCC has two clock inputs, BCLK and PCLK. It therefore has two PLLs. 
Only the PLL bypass control pin (PLLBYP) is shared by the two clocks. 


To guarantee PLL stabilization, RSTIN should be asserted for at least 100 ms 
after the power and clocks are stable to the MXCC. 


The operation of the PLL circuits can be disrupted by noise on the power sup- 
ply. 


To ensure proper operation of the PLL clock, system noise should be filtered 
from Мссскв, Мссскр Vssckp and Vsscxp Figure 23-3 shows а recom- 
mended filter circuit. 
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Figure 23-3. Typical Phase Locked Loop (PLL) Filter Circuit for MXCC 
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The PLLs rely on external loop filter capacitors to integrate phase compari- 
sons. External capacitors must be connected between PPLLRC and ground 
and between BPLLRC and ground. The recommended value for each of the 
capacitors is ОЛДЕ. 
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23.2 Input Clock Requirements 


The SSP and the MXCC can tolerate most clean stable clock sources when 
the PLL is enabled. With the PLL enabled, the chips use only the rising edge 
of the input clocks. Intemally, the processors perform as indicated below. 


23.2.1 SuperSPARC 


The SSP multiplies, then divides the clock to provide a stable 5096 duty cycle 
clock. Input duty cycle must be atleast 25% (either high or low). When the PLL 
is bypassed, care must be taken to provide a 5096 duty cycle clock. Pin timings 
for operation with the PLL bypassed are not defined. 


23.2.2 MultiCache Controller 


The MXCC doubles the frequency of the input clocks and then halves them to 
produce stable clocks with 5096 duty cycles. The high time of the input clocks 
must be between 2596 and 7596. 
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23.3 MXCC Synchronous and Asynchronous Operation 


The MXCC has two clock inputs, POLK and BCLK. PCLK provides timing to 
circuits on the processor side of the chip, including the VBus interface, 
E-cache control, and E-cache tags. BCLK provides timing to circuits on the 
system bus side of the chip, including the MBus and XBus interfaces. Data tra- 
versing between the two clock domains passes by way of FIFO queues. Con- 
trol signals traversing between domains pass through synchronizers. 


This organization allows the processor clock to be faster than the bus clock 
and allows for modular processor upgrades without affecting the rest of the 
system. 


MXCC allows for either synchronous or asynchronous operation controlled by 
the SYNC pin. The SYNC pin should not be changed except while RSTIN is 
asserted. E 


23.3.1 Asynchronous Operation 


When PCLK is faster than BCLK, MXCC must be operated asynchronously. 
Asynchronous operation is selected by deasserting the SYNC pin (H). Due to 
the design of the intemal synchronizers, POLK must be at least 10% faster 
than BCLK, and the ratio of PCLK to BCLK must not exceed 5 to 1. 


23.3.2 Synchronous Operation 


Synchronous operation is selected when the SYNC pin is asserted (|). In syn- 
chronous operation, the synchronizers on control signals between the two 
dock domains are defeated. For proper operation, BCLK and РОК must be 
connected to the same clock with a very slow skew between them (a maximum 
of 150ps of skew between them is recommended). 
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ibus Module 





pH 


This chapter explains how the MBus conversion module connects a Super- 
SPARC processor (SSP) and MultiCache Controller (MXCC) together for use 
on the MBus. 


Topic Page 
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24.1 Full Module MBus System 


The Full Module MBus system is diagrammed in Figure 24-1. The extemal 
cache memory provides significant performance improvement and greatly de- 
creases bus traffic in order to support more processors on a system bus. 


Figure 24~1. Full MBus Module Diagram 
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The E-cache is organized as a direct-mapped cache with a normal size of 
1M-byte. This configuration is implemented with eight 128Kx8 or 128Kx9 syn- 
chronous SRAMs. To implement byte parity on the E-cache data storage the 
128K x 9 SRAMs are needed. Parity is directly supported by both the 
ТМ5390250 and TMS390Z55. 


Synchronous SRAMS have registers on each input and output. This allows 
pipelined operation. An address is presented to an SRAM before the active 
clock edge, and it is registered in the SHAM at the clock edge. The SRAM 
reads out the addressed location before the next active clock edge, and the 
result is stored in an output register at the edge. New addresses can be 
supplied at each clock edge, and new outputs appear after one clock period 
of delay. Writing works similarly with address, data, output enable, and write- 
enable being registered on the active clock edge and stored into the internal 
array during the subsequent clock period. 


The synchronous SRAMS used in the Full Module configurations are available 
from several manufacturers. 


MBus Module 
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24.2 MBus Module Schematics 


The schematics of the MBus Module are presented in Figure 24-2 through 
Figure 24-6. 
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Figure 24-2, Full Module Schematic Diagram (sheet 1 of 5) 
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Figure 24-3. Full Module Schematic Diagram (sheet 2 of 5) 
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Figure 24-4. Full Module Schematic Diagram (sheet 3 of 5) 
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Figure 24—5. Full Module Schematic Diagram (sheet 4 of 5) 
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Figure 24-6. Full Module Schematic Diagram (sheet 5 of 5) 
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24.3 Mechanical Information 
See Section 7 of SPARC MBus Interface Specification. 


The dimensions of modules are summarized in and shown in Table 24-1. 
Through-hole components, components with heatsinks, heatsinks, and other 
tall components that may be on a module will be mounted on the top side of 
the module, opposite from the connector. Only low-profile, surface-mount 
components, if any, will be mounted on the bottom (connector) surface. 


Table 24-1. Dimensions of MBus Modules 





The module mechanical envelope is shown in Figure 24-7. 
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Figure 24—7. Module Mechanical Envelope 
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The connector types are given in Table 24-2. The TI module contains a plug, 
andthe system board should contain a mating receptacle. Tl modules may use 
the listed conntectors or other equivalent connectors. 


Table 24—2. Module Connectors 
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This appendix contains a summary of the SuperSPARC processor's (SSP's) 
instruction set. The instructions are formatted to allow easy decoding and en- 
coding. 
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АЛ Instruction Fields 


op 
op? 
cp3 
opt 


cond 


rsi 


simm13 


111122 


disp30 


asi 


This two-bit field encodes all three major formats. 
This three-bit field encodes format 2 instructions. 
This six-bit field encodes format 3 instructions. 


This nine-bit field encodes format 3 floating-point operate 
(FPop) instructions. 


A value of 1 in this one-bit field annuis the execution of the 
instruction that follows a conditional or an unconditional taken 
branch. 


This four-bit field selects the condition code(s) to test for a 
branch or trap instruction. 


This five-bit field is the address of a destination (or source) ror 
fregister(s) used in aload (or store) or arithmetic instruction. For 
instructions that read or write a double (or quad), the least signifi- 
cant one or two bits are unused and should be zero. 


This five-bit field is the address of the first ror fregister(s) source 
operand. For instructions that read a double (or quad), the least 
significant one (or two) bits are unused and should be zero. 


This five-bit field is the address of the second r or f register(s) 
source operand when the operand is not an immediate opera- 
tion. For instructions that read a double (or quad), the least sig- 
nificant one or two bits are unused and should be zero. 


This 13-bit field is a sign-extended 13-bit immediate value used 
as the second ALU operand for a load or store instruction or for 
an integer arithmetic instruction. 


This 22-bit field is a constant that the SETHI instruction places 
in the upper end bits of a destination register. 


This 30-bit field represents a word-aligned, sign-extended, PC- 
relative displacement for a call instruction. 


This eight-bit field is the address space identifier supplied to a 
load or a store alternate instruction. 


instruction Summary 
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f Denotes a privileged instruction. 

Denotes a privileged instruction if the ASR register referenced 
in the instruction ís privileged. 

e The SuperSPARC processor (SSP) does not support quad- 
precision operands. These instructions will cause an unim- 
plemented FPop trap; however, they can be implemented in 
software. 


* Denotes an instruction specific to the SSP. 
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A.2 Format 1 (op = 01) 


[op| — a —  — 

зї 29 0 

01 позор db dale RUE ла € SOR а ба RAP YE AER e) CALL 
Instruction Summary 
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A.3 Format 2 (op = 00) 
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Format 2 


BGE{,A} 


BCC{,A} 
BPOS{,A} 





00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
00 
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A.4 Format 3 (op = 10 or op = 11) 


18 13 12 4 б 
OOOO с: аза nian iX REESE SEV QA жани экиз» е эз m wig ADD 
00000] „а: osc asm wis tee акна NOSE REIS UE Oen m remi AND 
ODOOTU: uo dicat pe c EROR DOE ва ER HER UT CESTA оз OR 
000011. (oilssoniuEr илиш нии наран йи йи IRAN EUER VU XOR 
++: + ЖИН НЫ ОН ER HL гл EI RE PAL qr d e XO EE LN Et SUB 
COO TOT емо бы жек ave ine votato Ee emet soe ANDN 
ОООТІО PERPETUI ORN 
DOOTI са сети еј узе on ca SOs oe ae wow th fu RAS ER TG are XORN 
001000: ааган RES RC RES Dn је c tm је V XE FIR ADDX 
001010 ilias xe tare oio gnis ов UMUL 
00101]. usa oce aS o RORIS кнын max PERS Roo c WIE RRR rns SMUL 
то еур ро 0 о љута eo erai CI dre ture ваља Bde и Kl o vin dro apa d oo e c SUBX 
DOTITU. ocius e oni iini e Un a pri quis wrt fe S шеи» ыа жашау UDIV 
(ug err SDIV 
O10000 civ ecran p Ie eoa ATE RE IRR на n ИЕ Sce enero вла ADDCC 
ОЛ] селе ares vic qi & are Sra c амо оон Во парома ANDCC 
ШОП a aca uw Pe Se he Uv E END EG D Nea d ue. ORCC 
ОЛОО: ее e Se dine en e Gowan режи AN VAR cran XORCC 
010100 esa o indie nea вата калаа civ iow mre E m TEE SUBCC 
ад [оу | оу „ааа аниа cen тиг йз ANDNCC 
ООО auus v orale m VR e nde a RII EX aU re V ние en ORNCC 
оу [едн И кони куэр usata mec aviam un oes Dewey вета XORNCC 
TINO ppt ADDXCC 
QTIOTD: сео а win hie најава онога а hr иша эё жч ө nani mns UMULCC 
OTIOTI a iced na or tiv wi uie xa edie NTA REC. iiS sie A tin миса SMULCC 
bius еен тозе не акны на даъво Р SUBXCC 
ИО Me" UDIVCC 
ONT састаје ws и SDIVCC 
ОШООО 2, сео rin viv e p о ape ue ou ем dw dre id а aa ares TADDCC 
100001. жайка Fa m Dar Reo RH CH or ode vx ита TSUBCC 
100010. сала usn mmu sioe apa доене вана nn enia TADDCCTV 
ОВО езе vi avo dior de ei e n г TSUBCCTV 
100100. ал Wo dig eh in IM UE RF EIE EU OR IE а MULSCC 
TODIUT oes usus aco Бей ne еден ти A VES ER SLL 
ОВ ИО а sista aes wie aio rhe praec S rice rA o ul rien ata tn SRL 
uuu pr" SRA 
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10 101000 UND. ues rusansnuhkensxwaacdwkexou2axvauss RDY 

10 101000 PIED ГЛ ГГ К ГГ КК ERE RDASRt 
10 00000 101000 WEITE: енер үн уу ү та ен EREÀ TA STBAR 
10 00000 101000 ПАР TIT rere ett Tera QUA бо ero SIGMXx 
10 ТОО Pie eee eerie ty Tiree аташын кай re Tere кав RDPSRt 
10 TUTO IU. ios min ns di ны se he ko SM RR RR шукак eee Oe RDWIMt 
10 pulo, „ска hor ences dined en ия ночи nen end he ins aa ee RDTBRT 


31 29 24 18 13 12 4 0 

10 00000 110000 cian ех ж ьїк — —————— PP WRY 

10 го prio Mo зке» жеу» бзш ee стран ваге виа WRASRt 
10 ARNE oai eiiis sq ПАША ШАРАН WRPSRt 
10 110090 oeesusn nau iaws than wk мама ms san шш ER vie WRWIMT? 
10 TIOUIT iones msn uA RA rT Tit аи аъ FR IRAE ATA җай E eer ee WRTBRt 


t - privileged instruction. 

t - privileged instruction if source/destination register is privileged. 

ж - instruction specific to the SSP. 

@ - SuperSPARC does not support quad-precision operands. These 
instructions will cause an unimplemented FPop trap. These instructions 
may be implemented in software. 
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10 110100 000101010 ................... FSQRTd 
10 110100 ООО accio suena nau FSQRTqO 
10 110100 О01000001 aeeaonue senum FADDs 
10 110100 ВОО aussassssnsuusesévr. FADDd 
10 110100 001000011 °................... FADDq@ 
10 110100 001000101 caoineas FSUBs 
10 110100 001000110 „како кана FSUBd 
10 110100 OTT „киа sesion FSUBq@ 
10 110100 001001001 >................... FMULS 
10 110100 001001010 сз». зага. FMULd 
10 110100 ООЛОШОП гакховиенаннскиона FMULq® 
10 110100 IG) окъ жиен зау изге FDIVs 

10 110100 001001110.................... FDIVd 
10 110100 001001111.................... FDIVq® 
10 110100 GIUM „..... эзиз...» FsMULd 
10 110100 ООО з.» «эша» жаы FdMULqé 
10 110100 DUNEDIN „.....--5-.-,5»»=һ FITOs 
10 110100 ООО М ооа анагаан FdTOs 
10 110100 ОЛТООПТИ Loue corri ntn FqTOs6 
10 110100 011001000 ................... FITOd 
10 110100 ОНИ... кар eee FsTOd 
10 110100 sip [+ у m Едтоде 
10 110100 Lig ciuis P — иза FiTOq @ 
10 110100 Lig ту |н у ЖИМИ и ааа: FsTOq 6 
10 110100 КЕНИЈЕ P FdTOq@ 
10 110100 IIIA РИ РИА FsTOl 

10 110100 ROIS. uinneavaxtwssvusus FdTOi 
10 110100 bo! ^q y оса nee Рато © 
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|ор | 00000 | орз | | op | "2 | 
31 29 24 18 13 4 0 
10 110101 CGO „аел гика FCMPs 
10 110101 001010010 ................... ЕСМРа 
10 110101 001010011 _................... FCMPq® 
10 110101 001010101 аи verre or ЕСМРЕ5 
10 110101 001010110 ................... FCMPEd 
10 110101 ОТОТ а звања виола РСМРЕ4Ф 
Р зь. | 
t - privileged instruction. 
+ - privileged instruction if source/destination register is privileged. 
* - instruction specific to the SSP. 
Ф - SuperSPARC does not support quad-precision operands. These 
instructions will cause an unimplemented FPop trap. These instructions 
may be implemented in software. 
Instruction Summary 
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Format 3 





10 Lbs. ЙЫНЫН НЕ JMPL 


31 29 28 24 18 13 12 4 0 

10 0000 TIU. гасна шею ави ки» к од иен уки каана невара анаи TN 
10 0001 THU. sensns ganin юл SEE V asia FN кж es ees TE 
10 0010 ИО 1: C чение изя TLE 
10 0011 ТЕЛО... аео ва ш айак зк аёаа ол вааай TL 
10 0100 111010. са ss и yam oia ва ба сва Oe нат OM RR UR наи TLEU 
10 0101 ПАО eee ery rr er errr nT Terr Hm TCS 
10 0110 TINIO И iust ku VR EE ER PCR Er e TNEG 
10 0111 pb ol Beer TVS 
10 1000 TII ке 3242953845 3A AR TR S RAM hacienda wx £r Rs uie TA 
10 1001 DLL Eee ee Ter эпкин зк кай же ива иа езе аа TNE 
10 1010 11010: ep——— ———UÁ TG 
10 1011 WADED bina daa Ros RENS RED dU ona эб nati meo md ome TGE 
10 1100 КОТО „реке ни енклава IS Cn a EE E c TGU 
10 1101 ii| e ——————— wae ieee ew TCS 
10 1110 ADIOS рока aude Vue орла pa CR RR hire TPOS 
10 1111 THUMB кзз »и»я вали шж жа 2E AERKMAMUNKAREIRERVOE aX HER ERR түс 


з1 29 24 18 13 12 4 0 

10 00000 MM. ee eT ery ка ere Ter er er rer Cry rey FLUSH 

10 ils ‘sana Riess ike pri aes а= mr SAVE 

10 TATUM), хажы рен aic id Roi MAR Se YER A ui Pr RR а RESTORE 
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000000 „атезанститнињињавних SOR Fat ei EC AQERCUN XE 9 CR UE LD 
(ОООО ыал» жнын еб REN PL PUE бев оваке LDUB 
DOC erbe di ied мој ове IAS S eR HEIN TERES OP RARE LDUH 
DOE ues Go Re pnma pa nali va dte Be Pa MR CRURA Pre A Ае LDD 
осо грр: o PM TRE ra tN ——— Pr ST 
ODOTUT «сага патка па ean ананна kn вы AR Su E в нава кайна ка STB 
UNITE уине + мия rar оливин RR DERE Жашын o n n STH 
DRIVE. ciate asap Пати STD 
Llc ааа ананна аав — — —— € LDSB 
Lan toe Pert Tr rere TT Ty кай ёа ж-а аа ви та LDSH 

| uli ——————————— PT LDSTUB 
s gh: — ——————— ји SWAP 
ПОО E——————— немац LDAT 
A: ПИ ОИСИ HAAS Ратни LDUBAT 
ОЛОПТО Peter err EAR са ча ника жа E ere eer LDUHAT 
ПОВ Pe errr rr rte Tre TTT err TTT TT Terre э ни LDDAt 
DIUI ueni eusetrusvayauan Oe eke nse mt рани и « STAT 
DIU hag Sadao diia n wh es du end RM impos war Suo a drip Pep 5ТВАТ 
ОЛОТ ere ter rt rte әт жаш «йасин шы vaste etas STHAt 
DIDUIT araxadeusuxex sum Eu annie жазы эз AER ECRIRE ежа STDAT 
01001 "————————" LDSBAT 
pup MT—————Q LDSHAT 
ONION T ws Gn weed Oe ease pisses Kee TAIN LDSTUBAT 
ОГПИ sigues RES p ode DEA UA фена ша een ease paid SWAPAT 


18 13 12 4 0 
100000 nios are e каје Lame Fl mde ee en x и етан LDF 
100007; „гахе каља ли WEE иша з ESO m di opere WAP а SECO LDFSR 
100011. <не erro item PR hia рф у жкен ж жа LDDF 
100100. 218.5 5. cg sis eine va irm uid n mh R oie m hmm t moram vies STF 
100101 „авансна tym o x nm а жаашын ыл ма ERES RATS STFSR 
TOOTO aix serai Si ERES вена EI Genes as how ненна STDFQT 
ТОО а e ое arene аё аа va а Sh iul rta t raten STDF 
Instruction Summary 
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Appendix B 





This appendix provides an overview of the SuperSPARC processor's (SSP's) 
ASI assignments, which are consistent with the SPARC Reference MMU 
(SRMMU) and the "Suggested ASI assignments" from the SPARC Architec- 
ture Manual, version 8. 


Since SuperSPARC does not generally transmit АЗ! accesses external to the 
chip, system-visible ASI accesses are limited to: 


(3 Transparent Memory Management Unit (MMU) mode (ASis 0x20-0x2f) 
(Д Normal data and instruction references (ASIs 0x08-0x0a) 
(3 Control space accesses (ASI 0x02) 


All eight bits of the ASI are decoded; an error occurs on any access to reserved 
ASI values. Each supported ASI specifies the types (LD/ST) and sizes of ac- 
cesses allowed; violations cause errors. 


[Oxot ______ Data Cache Data LD/ST_| double | 10.5.3 | 


AS! 0x02 allows access to the information in an extemal device, nominally an 
external cache controller. The information may include cache controller (CC) 
registers, the external cache, and its directories. These accesses аге 
non-cacheable, regardless of MCNTL.AC indication. Data references to this 
control space access may be any size. External system hardware ís responsi- 
ble for proper data alignment. Any faults will be reported as adata, access, ex- 
ception. 


Table B-1 lists all of the ASI values supported by the SSP. Each has been ex- 
plained elsewhere in this manual. A reference to the manual section that de- 
scribes the ASI is also provided. 





Note: 


The instructions LDSTUBA and SWAPA generate data access exception 
on ASI values other than 0x08-0x0b or 0x20-0x2f. Any ASI access with a size 
other than indicated in the following table will generate a data access, ex- 
ception. 
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ASIl/Diagnostic Access 





Table B-1.SuperSPARC Processor ASI Assignments 















[Asi] мы [роз | Ss [Reference 
НТ | мем — — — —]1- — |:  —]- — 
оде | Control Space Access [Бї si | 
ода — — |MMUPrbe  — oer | singe | 98 — 
Foxe — — |MMURegstes | LoT | виде | 912 — 

Preseved — СИ 
ову — [етее | 








sing! 


Supervisor Instruction 

ру 
LD/ST 
Fox ____ instruction Cache Data — | LD/ST | double | 1033 | 
| охбе _____ Data Cache Tags | LD/ST | double | 1052 | 
| OxtütoOxtf [reserved — — — — — |- [| [| 
| 0х20 (00х21 | MMU Bypass | ШТ | al — | 
| OxX80 — | StoreBuffer Tags | LD/ST | double | 1071 | 
| 0х33 10035 | reserved ^&—— —  &— |- — |- _|- __ 
| Ox36 | instruction Cache Flash clear | ST — | single | 1031 | 
0538 | MMU Breakpoint Diagnostics | LD/ST | double | 1521 | 
0529 


iili 


TEN 
TERE 
pall 
pall 
pall 
аі 












Emulation Data Out 4. 
Emulation Exit PC LOST [sioe | 2241 | 
0x48 Emulation Exit nPC D/S i | 2241 | 





ASV/Diagnostic Access 
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Appendix C 


SuperSPARC Processor Pin Description Tables 









Table C-1, Table C-2, and Table C-3 list each pin of the SuperSPARC pro- 
cessor (SSP) and describe its function. Use Table С-1 for VBus configurations 
and Table C-2 for MBus configurations. Table C-3 lists power connections. 
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SuperSPARC Processor Pin Description Tables 
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Table C-1. Pin Functions — VBus Interface (CCMODE = L) 


A21 
C21 
G21 
E21 
А19 
C19 
E19 
G19 
A17 





SuperSPARC Processor Pin Description Tables 
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SuperSPARC Processor Pin Description Tables 


«елее еее еее! 








Table C—1. Pin Functions — VBus interface (CCMODE = L) (Continued) 


This signal is an input to indicate that system logic is prepared to accept another address or bus cycle. 
This signal needs to be low when using VBus interface. 

H - System not ready. 

L = Systemready. 


This signal is used to indicate that the current address on the bus is part of a burst bus cycle 
Н = Part of multi-cycle burst. 
= Not part of multi-cycle burst. 


This signal indicates that ihe current transaction is internally cacheable. 
H = Noncacheable transaction. 
L = Cacheable transaction. 


Cache Controller Mode. Selects ihe operation of the SSP for standalone operation, or for operation with 
a cache controller (such as the MXCC). The operation of the store buffer, dala cache operation and the 
bus interface (VBus ог MBus) are selected from this signal. This signal must be statically asserted and 


not changed during normal operation. 
Н = MBusinteríace of operation is selected, data cache operates сору-баск. 
L = VBusinterface of operation is selected, data cache operates write-through. 


Command strobe. Indicates the beginning of a bus cycle. 
When the SSP is not in bus master mode, as indicated by WGRT and RGHT being deasserted, CMOS 
is used as a input to initiate external snoop transactions (including invalidates and demaps). 
H = Notacommand word. 
L = VBuscommand word on ADDR35 - ADDROO, CCHBL, CSA, DMAP, LOST, 
SIZE1- SIZEO, SU, RD, and WR. 


When the SSP is a bus master it asserts this signal for the first cycle of a VBus transactions. 
Н = Nota command word. 
= VBus command word on ADDR35 - ADDROO, CCHBL, CSA, DMAP, COST, 
SIZE1- 512Е0, SU, AD, and WR. 


This signal indicates thal the current bus transaction is a control space access. It is asserted for the 
alternate space indicator (ASI) transactions to ASI space 0x02. 
Н = пота! memory or ASI access. 
= control space access (to ASI 0x02). 


t These pins are pulled inactive with weak internal resistive pull-ups. 
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Table C—1. Рт Functions — VBus Interface (CCMODE = L) (Continued) 
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NCOP AEE 


Table C—1. Pin Functions — VBus interface (COMODE = L) (Continued) 


Asserted with CMDS to indicate demap cycle. As an input indicates an extemal demap cycle. 
When output: 
H = погта!соттапд word. 
= детар cycle to system (System should remove TBL entries matching request). 


Data bus parity. When parity is enabled (by setting the parity enable bits in ihe MCNTL register), even parity 
is generated and checked. When parity is disabled, odd parity is generated but parity is not checked. 
DPARO is parity for bits DATA63 - DATAS6, etc., as listed: 

DPARO DATA63 -DATA56 DPAR1 ОАТА55 - DATA48 

DPAR2 DATA47 -DATA40 DPAR3 ОАТАЗ9 - DATA32 

DPAR4 ОАТАЗ1 -DATA24 ОРАР5 DATA23 - DATA16 

DPAR6 DATA15-DATAQS DPAR7 - DATAO07-DATAO0 


This signal indicates that the SSP has entered an error mode state and will take a watchdog reset trap. 
H = Normal operation. 
L = Ето Mode. 


H = Programmed breakpoint event is occurring. 
L = (пасте. 


t These pins are pulled inactive with weak intemal resistive pul-ups. 
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Table С-1. Pin Functions — VBus Interface (CCMODE = L) (Continued) 






Interrupt request level. This field specifies the level of the highest priority interrupt request that is currently 
pending. If IRL3 - IRLO = 0000, no interrupts are pending. 

Level 15 (IRL3 -IALO = 1111) is a ММ! (disable all traps) Level 14 Highest Maskable Interrupt 

Level 1 Lowest Maskable interrupt Level 0 No Interrupts are Pending 


This signal indicates an atomic load/slore (LDSTUB, LDSTUBA, SWAP, 0 SWAPA) operation. It is 
equivalent to the logical OR of RD and WR signals. 

Н = NoLDST. 

L = Atomic Load/Store (LDST) cycle. 


This signal is encoded with RADY ог WADY and with RETRY to indicate the type of acknowledgment. 


МЕХС RRDY/WRDY RETHY Description 
No reply 

































Retry 

Data transfer complete 
Undefined error (UD) 
Bus error (BE) 
Timeout error (TO) 
Reserved 

Reserved 


Asan output, this signal controls the pipelined output enable of extemal cache SRAM. It is used as an input 
to prevent bus collisions. 

H = SRAM outputs disabled. 

L - SRAM outputs enabled. 


This signal indicates tha! at least one outstanding write operation has not completed. 
J = System has no incompleted write operations outstanding from this processor. 
= System has write operations that were issued by this processor that are not yet complete. 


= Avalid memory reference occurred in the EO stage of the previous clock cycle. 
= No valid memory reference occurred in the EO stage of the previous clock cycle. 
E à = А уак! floating point operation occurred in the EO stage of the previous clock cycle. 
PIPES = No valid floating point operation occurred in the EO stage of the previous clock cycle. 


= A valid control transfer instruction was executed in ihe EO stage of the previous clock 
cycle. 
= Novalidconiroltransfer instruction was executed in the EO stage of the previous clock cycle. 


= Indicates that no instructions were available when the group currently at ihe WB stage was 
in the DO stage. 
= Indicates that one or more instructions were available in this group. 


= The pipeline is being held by the data cache (generally processing a cache miss). 
= The pipeline is not being held by the data cache. 


= The pipeline is being held by the FPU (either quaue is full or dependencies). 
= The pipeline is not being held by the FPU. 


C9 = Indicates that the branch in EO stage of the previous cycle was taken. 
= Indicates that the branch in EO stage of the previous cycle was not taken. 


о-оо а 
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Table C—1. Pin Functions — VBus Interface (CCMODE = L) (Continued) 


sew. [w[mwo| отш | 


Indicates the number of instructions in the EO stage of the current cycle: 


PIPE2 - PIPE1 Instructions In E0 Stage 
00 None 
01 


= Indicates that there is an exception or interrupt being signalled in the current cycle. 
= Indicates that there is not an exception or interrupt being signalled in the current cycle. 


This pin is used to bypass the internal phase lock loop. When this pin is asserted, the external clock input 
will be routed directly to internal clock distribulion with no delay compensation. 

H = PLL enabled. Normal operation. 

і = РИ disabled. No clock delay compensation. 


SSP drives AD to qualify addresses on the VBus as READ cycles. It is also asserted with WR for swap 
cycles and with DEMAP for demap cycles. As an input, used for intemal SRAM test only. 

H = Notaread cycle. 

L = Read (or Load/Store with WR and СОСТ low) cycle. 


Reset. This causes an external reset for the SSP. At power-on, RESET must be held low tor at least 100 
ms to ali allow the PLL to stabilize. If the РЦ. is known to be stable, RESET may be asserted for as short 
as B cycles. See reset operation. 
3 = Normal operation. 
= The SSP is externally reset. 


This signal indicates that the SSP has been given a grant to use the VBus for read operations. 
Н = VBus not available for read operations. 
L = VBus available for read operations. 


This signal indicates that incoming read data is valid. RAOY may be connected to WADY when only a 
рари. This signal is encoded with МЕХС and RETRY. See МЕХС description for 


These bits indicate the transfer size of the current transaction. 
00 = Byte 
01 = Half Word 
10 = Word 
11 = Doubleword 


This signa! indicates that the current bus transaction is a supervisor transaction. 
H = User (unprivileged) transaction 
L = Supervisor (privileged) transaction 


t These pins are pulled inactive with weak internal resistive pull-ups. 
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Table C—1. Pin Functions — VBus Interface (CCMODE = L) (Continued) 


[ sew. [wo[mwo| — O  —sexmmow ——  — — —] 
w [wem [mowasdmm —  — —  — — — —  — о 
mo fo [ats [momma ——  —  — 


J———— = level testing. 
= Normal operation. 
= Ali outputs except ESB and ТОО are placed in a high-impedance state. 


sey исо со 
hax — ОЕ Е ООО 
потисне 




























These signals directly control the write enable signals of synchronous SRAM used for external cache. 
These signals are drivenonly when asserted; otherwise, they are three-state. WE bit ordering corresponds 
to the big-endian convention (i.e., WEO is the write enable for byte 0 (DATA63 - DATA56)). 

H = SRAMread. 

L = SRAMwrite. 


This pin is used to control the assertion of WE? -WEO signals. 
H = Maynot drive WE7 -WEO. 
L = Maydrive WE7-WED. 


SSP drives WR to qualify addresses on the bus as write cycle. it is asserted with AD for swaps as well as 
demap cycles. As an input, this signal is used to qualify invalidation requests. 

Н = Nota write сусе. 

L = Write (or Load/Store with RD and LDST low) cycle. 


This signal indicates that incoming read data is valid. WADY may be connected to RADY when only a 
single ready signal is required. This signal is encoded with MEXC and RETRY. See МЕХС description for 
table. 





















This signal grants the SSP bus access for write operations. WGAT may be connected to RGAT when only 
a single grant line is required. 

à = VBus not available for write operations. 
= €— rM 


| ANZ? _ | Not used in the VBus imertaco. __________- 
P UOU 


t These pins are pulled inactive with weak internal resistive pull-ups. 
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Table C—2. Pin Functions — MBus Interface (CCMODE = Н) 


In error mode, the SSP will perform an automatic watchdog reset. Error mode is entered when any excep- 
tion is taken with traps disabled (PSR.ET=0). This signal is driven only when asserted; otherwise, it is 
three-state. 

H = Normal operation. 

L = Error mode. 


Cache Controller Mode. Selects the operation of the SSP for standalone operation, or for operation with 
а cache controller (such as the MXCC). The operation of the store buffer, data cache operation and the 


bus interface (VBus or MBus) are selected from this signal. This signal must be statically asserted and not 
changed during normal operation. 
Н = MBus interface of operation is selected, data cache operates copy-back. 
VBus interface of Operation is selected, data cache operates write-through. 


Execution strobe output. 
= Programmed breakpoint event is occurring. 
L = Inactive. 
t These pins are pulled inactive with weak internal resistive pull-ups. 
$ These pins have an open drain. 
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Table С-2. Pin Functions — MBus Interface (CCMODE = Н) (Continued) 


Multiplexed Command / Data. 
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Table C-2. Pin Functions — MBus Interface (CCMODE = Н) (Continued) 


seu. [w[mwo| жошо | 
K2 


Multiplexed Command / Data. 


MBus Address Strobe. Asserted by the bus master when an MBus command word (containing address 
and control information) is on MAD63 - MADO. 

H = Nocommand word. 

L = MBus command word on MAD63 - MADO. 


MBus Busy. Asserted when there is any active transaction on MBus. 
H = MBusfree. 
L = MBusbusy. 


MBus Grant. This is a dedicated (not bused) signal гот the MBus arbiter to this bus master. 
H = Notgranted. The SSP may not initiate an MBus transaction. 
L » Granted. The SSP may initiate an MBus transaction as soon as MBus is free. 


MBus Request. This is a dedicated (not bused) signal from this bus master to the MBus arbiter. 
H = Norequest. 
L = Requesting to initiate a transaction on MBus. 
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Table C-2. Pin Functions — MBus interface (CCMODE = Н) (Continued) 


MBus Error. Encoded along with MRDY and MRTY to indicate acknowledgment type (the type of error 


МЕНН  MRDY  MRTY Description 

H Idle cycle 
Relinquish and retry 
Valid data transter 
Reserved 
Bus error (ERROR1) 
Timeout error (ERROR2) 
Uncorrectable error (ERROR3) 
Retry 


MBus Module ID. The identifier of this MBus device. Usually hardwired by the system. MID3 is the Most 
Significant bit (MSb) and MIDO is the Least Significant bit (LSb). 


Memory Inhibit. Asserted by a snooping cache when it notices a coherent read of a cache block it owns. 
Memory responds to this signal by ignoring the request. 
H = Nomemory inhibit. 
і = Inhibit memory. The snooping cache which asserted МІН will respond with the data in 
place of memory. 


Interrupt Request Level. This field specifies the level of the highest priority interrupt request that is currently 
pending. If MIRL3 - MIRLO = 0000, no interrupts are pending. 
Level 15 (МАЕЗ - MIRLO = 1111) is а NMI (disable all traps) Level 14 Highest Maskable Interrupt 
Level 1 Lowest Maskable Interrupt Level 0 No Interrupts are Pending 


BRDY MBus Ready. Encoded along with MERR and MRTY to indicate acknowledgment type (the type of error 
response). See table in МЕНИ description. 
MBus Retry. Encoded along with МЕНА and MADY to indicate acknowledgment type (Ihe type of error 
response. See table in МЕНЯ description. 


Memory Shared. Asserted by a snooping cache when it notices a coherent read of a cache block it is cach- 
ing. Both caches will mark the data as shared. 

H = Nosharing. 

L а Shared data. 


This signal is generally used in the VBus interface only. It indicates to the SSP that at least one outstanding 
write operation has not completed. 
H = System has no write operations outstanding from this processor. 
L = System has write operations that were issued by this processor that аге not yet 
complete. 


Н = Avalid floating point operation occurred in the EO stage of ihe previous clock cycle. 
L = № мама floating point operation occurred in the EO stage of the previous clock cycle. 
t These pins are pulled inactive with weak internal resistive pull-ups. 
t These pins have an open drain. 
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Table C-2. Pin Functions — MBus Interface (CCMODE = Н) (Continued) 







DESCRIPTION 


= Avalid control transfer instruction was executed in the EO stage of the previous clock 
cycle. 

L = Novalid conirol transfer instruction was executed in the EO stage of the previous clock 

cycle. 











= Indicates that no instructions were available when the group currently at the WB stage 
was in the 00 stage. 
L = Indicates that one or more instructions were available in this group. 


= The pipeline is being held by the FPU (either queue is fuli or dependencies). 
= The pipeline is not being held by the FPU. 

H = Indicates that the branch in EO stage of ihe previous cycle was taken. 

L = Indicates that the branch in EO stage of the previous cycle was not taken. 


Indicates ihe number of instructions in ihe EO stage of the current cycle: 
PIPE2 - PIPE1 Instructions in E0 Stage 
PIPE2 
PIPE1 00 "oes 
01 1 
10 2 
11 


eeo Joje | H = indicates that there is an exception or interrupt being signalled in the current cycle. 


L = indicates that there is not an exception or interrupt being signalled in the current cycle. 
This pin is used to bypass the internal phase lock loop. When this pin is asserted, the external clock input 
РИНУР it will be routed directly to internal clock distribution with no delay compensation. 
H = Normal operation. 
L = The SSP is externally reset. 
ИИ of Not used. Should be tied high or left floating during normal chip operation. 


Н = PLL enabled. Normal operation. 
Not used. Should be tied high or left floating during normal chip operation. 





























PLL disabled. No clock delay compensation. 


Reset In. This causes an extemal reset for the SSP. At power-on, ASTIN must be held low for at least 100 
ms to ail allow the PLL to stabilize. If the PLL is known to be stable, RSTIN may be asserted for as short 
as 8 cycles. See reset operation 












spared 


Es h |ы E EL m 
по — |e e лова 7 
то [о [Ө ООО 


This pin can be used for board level testing. 
it | E25 


Н = Normal operation. 
[mms | № | сга | JTAG test mode select input. 


L = Alloutputs except ESB and TDO are placed in a high-impedance state. 
t These pins are pulled inactive with weak internal resistive pull-ups. 
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Table С-2. Pin Functions — MBus Interface (CCMODE = H) (Continued) 


ew. [ro feno] — — — — —— SESHEWN SS 
mer [1 [лз [messes 












VPLLRC КЕ | uss | Phase locked loop filter capacitor. This pin should be connected to an external 0.1 uF capacitor to Прва e ттт 


AJ27 
AK28 


CT 


| 7 


ГРИ В ы — — — — — .— 
= [| [Ан [иши — — —— ——— | 
je [3 pr СЕТИ 


MES 
AE29 
AF34 
AF2 Not used for MBus. 
АЕБ 
a 


eS Oe ae 
wt [ces |нишелиемењ — — 
m С [Аз [мшш в — 5 — | 
m — | As [шшшю е» — 
= [ї Гани o 
| A E 
m [u ]wummnme OOOO OOM 
m | [as р 
m — —[- [ [maia  —  —- —  — — — — — 
w | le eo асн 


t These pins are pulled inactive with weak internal resistive pull-ups. 
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Vv. 
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Table С-2. Pin Functions — MBus Interface (CCMODE = Н) (Continued) 





t These pins are pulled inactive with weak internal resistive pull-ups. 
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Table C-3. Рт Functions — Power Connections 


| SIGNAL [00 PIN NUMBER DESCRIPTION 


K4, РА, АВА, АРА, AM10, AM14, АМ22, АМ26, AF32, AB32, P32, КЗ2, 026, 
СЛИНИ Е PO EOE | ots roe ee 



















eux! С СИ ҮТ 
Yoa | F6, V2, АКБ, AP18, AK30, V34, F30, B18 +5 volts for input buffers 


+5 volts for peripheral logic. 
AD32, Y32, T32, M32, H32, 028, D24, D20, 216, D12, DB. Ground tor core logic 


for clock and PLL 






M6, P2, T6, Y6, AB2, AD6, AK12, АР14, AK16, AK20, AP22, AK24, AD30, 
AB34, Y30, T30, P34, M30, F24, B22, F20, F16, B14, F12 






H4, M4, T4, Y4, AD4, AH4, AMB, AM12, АМ16, AM20, AM24, AM28, AH32, 


Ground 
G7, V4, AJ7, AM18, AJ29, V32, G29, D18 Ground for input buffers 










K6, M2, P6, T2, V6, Y2, АВ, AD2, AF6, AK10, АР12, AK14, AP16, AK18, 
AP20, АК22, AP24, AK26, AF30, AD34, АВЗО, Y34, V30, T34, P30, M34, K30, 
F26, B24, F22, B20, F18, B16, F14, B12, F10 


S 
Voce 
Vcccuk Ý 
Voci 
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Table 0-1, Table D-2, and Table D-3 list each pin of the МинСасће Control- 
ler (МХСС) and describe its function. Use Table D-1 for configurations of the 
MXCC using MBus and Table D-2 for configurations using XBus. Table D-3 
lists power connections. 
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Running Title—Attribute Reference 


Table 0-1. Pin Functions — MBus Configuration (MBSEL = Н) 





t These pins have internal holding drivers. 
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MultiCache Controller Pin Description Tables 


Table D-1. Pin Functions — MBus Configuration (MBSEL = Н) (Continued) 


indicates either an intemal MXCC error or ERROR is asserted by the processor. 
H = Noerror 
L » Aninternal processor or MXCC error 


Capacitor for the phase filler of the bus clock PLL. This pin should be connected to an external capacitor 
1o ground. With an internal resistor, this circuit provides the RC time constant for the phase filter of the bus 
clock domain PLL. 


Indicates whether a burst access is in progress, BURST is driven at the same time as ADDR35 - ADDRO, 
and it is asserted during both read bursts and write bursts. BURST is deasserted on the fast address of a 
BURST burst to allow the MXCC to stop returning RADY or WRDY with the last dala of the burst. 
H - Aburst access is in progress. 
L = Aburstaccess is not in progress. 


Cacheable access. This pin indicates the current processor transaction as one thal may be cached in an 
extemal cache. 

Н = Noncacheable access. 

L = Cacheable access. 
Command strobe. Indicates the beginning of a bus cycle. The VBus master asserts this signal for one cycle 
to begin all of iis accesses. 
When the MXCC is a bus master, as indicated by WGĦT and RGRT being deasserted, it asserts CMDS to 
initiate invalidate and demap transactions. 

H = Nota command word 

L = VBus invalidate or demap command word on ADDR35 - ADDRO, DEMAP, and WR. 


When the MXCC is not a bus master, this signal indicates the first cycle of a VBus transaction. 
Н = Nota command word. 
L = VBuscommand word on ADDR35 ~ ADDRO, CCHEL, CSA, DEMAP, LOST, SIZE1 - SIZEO, 


Control-space access. The processor asserts this signal when performing a read or write to the internal tag 
RAM, E-cache, or registers of the MXCC. 
H = Normal memory access. 





f These pins have internal holding drivers. 
+ These pins have internal pullup resistors. 
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Table D-1. Pin Functions — MBus Configuration (MBSEL = Н) (Continued) 





t These pins have internal holding drivers. 
t These pins have internal pullup resistors. 
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Table D—1. Pin Functions — MBus Configuration (MBSEL = H) (Continued) 


Processor data bus (continued). 


INPUT: Asserted with a Петар Data Word on DATA63 – DATAO and WH asseried to pass a датар request 
from the processor to the MXCC, and then to the system bus. DEMAP asserted with RD asserted indicates 
that ihe processor has sucesstully completed a бетар operation requested by the MXCC (initiated from the 
system bus) 

H = Nodemaprequest 

L = When WR process has requested a demap cycle 

When AD process has completed a system bus requested demap cycle 

OUTPUT: Asserted when the system bus has requested а бетар operation. DATA63 - DATAO contains а 
Demap Data Word indicating which virtual address translations are to be discarded. 

H = Nodemaprequest 

і = System bus requested demap cycle 


Data bus parity. When parity is enabled, even parity is generated and checked. DPARO is parity for bits 
DATA63 – DATA56. When parity checking is disabled, odd parity is generated but not checked. 


DPARO: DATA63 - DATA56 DPAR1: ОАТА55 - DATA48 
DPAR?: DATA47 - DATA40 DPAR3: DATA39 - DATA32 
DPAR4; DATA31 - DATA24 DPARS: DATA23 - DATA16 
DPAR6: DATA15 - DATAOB DPAR7: ПАТАО7 ~ ОАТАОО 


Processor error. The processor asserts this pin when it has entered an internal error state. The МХСС initi- 
ates an internal reset when EHHOR is asserted. 

Н = Normal operation 

L = Processor internal error 





t These pins have internal holding drivers. 
t These pins have internal pullup resistors. 
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Table 0-1. Pin Functions — MBus Configuration (MBSEL = Н) (Continued) 


interrupt request level. This field specifies to the processor the level of the highest priority interrupt request 
that is currently pending. If IRL3 – (АЦО = 0000, no interrupts are pending. 

Level 15 (IRL3 - IRLO = 1111): Nonmaskable interrupt. 

Level 14: Highest maskable interrupt. 


Level 1: Lowest maskable interrupt. 
Level 0: No interrupts are pending 


This signal indicates an atomic load/store (LDSTUB, LDSTUBA, SWAP, or SWAPA) operation. | is 
equivalent to the logical OR of RD and WH signals. No other transactions may occur while LOST is asserted. 
Н = NoLDST 





t These pins have internal holding drivers. 
t These pins have internal pullup resistors. 
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Table D-1. Pin Functions — MBus Configuration (MBSEL = H) (Continued) 


MBus multiplexed command / data bus. 
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Table D~1. Pin Functions — MBus Configuration (MBSEL = H) (Continued) 


MBus multiplexed command / data bus (continued). 


MBus address strobe. Asserted by current master when a valid address/command is present on 
MAD63 - MADO. 

H = Avalid address/command is not present on MAD63 - MADO 

і = Avalid address/command is present on MAD63 ~ MADO 


MBus busy. MBB is a 3-state signal that is asserted by the current bus master as long as the current bus 
master is using the MBus. 

Rising edge = Completed MBus transactions (released) 

Hi-Z = MBus not busy 

L = MBus is busy 


MBus grant. This is anonbused signal from the MBus arbiter to a potential master. It is asserted by the exter- 
nal arbiter when this master has been granted the MBus. 

H = The MXCC not granted the MBus 

L = The MXCC granted the MBus 


MBus request. This is a nonbused signal to the MBus arbiter. It is asserted by the MXCC when it needs to 
access MBus. 

H = The MXCC does not need to access the MBus 

L = The MXCC needs to access the MBus 


MBus select. This signal is used to select the system bus interface. This signal should not be changed during 
operation of this device. 

Н = МВиѕ system interface 

L = XBus system interface 


MBus error. Encoded along with MADY and МАТУ to indicate acknowledgment type (the type of error 
МАБУ 
H 


Description 

Idte cycle 

Relinquish and retry 

Valid data transfer 

Reserved 

Bus error (ERROR1) 

Timeout error (ERROR?) 
Uncorrectabie error (ERROR3) 
Retry 


Crrnrrrrzr 
rzrzrzrzi 





t These pins have intemal holding drivers. 
+ These pins have internal pullup resistors. 
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Table D—1. Pin Functions — MBus Configuration (MBSEL = H) (Continued) 













Memory exception. This signal is asserted when the memory controller could not return or accept the re- 
quested data. This signal may cause the processor to take a memory exceplion trap. This signal is encoded 
with RADY or WARDY, and RETRY to indicate the type of acknowledgment. 


MEXC . RRDYWRDY RETRY 


H H H No reply 

H H L Retry 

H L H Data transfer complete 
H L L Undefined error (UD) 
L H H Bus error (BE) 

L H L Timeout error (TO) 

L L H Reserved 

L L L Reserved 


MBus module ID. The identifier of this MBus device and is usually hardwired by the system. MID3 is tha most 
significant bit (MSb) and MIDO is the least significant bit (LSb). 









MBus memory inhibit. This signal is asserted by a snooping cache during coherent reads when it finds it has 
the аілу copy of cacheable data. When МЇН is asserted during a MBus transaction, memory (slave) is inhib- 
ited from responding and the snooping cache supplies the data instead. The MXCC can assert this signal 
when snooping MBus transactions. Н senses this signal when it is either the master or a slave on an MBus 
transaction. 

H = Normal operation. 

L = Inhibit memory, snooping cache to supply data. 


MBus system interrupt request level. This field specifies the level of the highest priority interrupt request that 
currently pending. If MIRL3 - MIRLO = 0000, no interrupts are pending. 
Level 15: (MIRL3 - MIRLO = 1111) NMi (disable all traps). 

Level 14: Highest maskable interrupt. 

Lavel 1: Lowest maskable interrupt. 

Level 0: No interrupts are pending. 


MBus ready. Encoded along with МЕНН and MATY to indicate acknowledgment type (the type of error 
response). See table in MERR description. 
MBus retry. Encoded along with MERR and MRDY to indicate acknowledgment type (the type of error 
response. See table in МЕНН description. 


MBus shared. This is asserted by Snooping caches that have a valid entry matching the address of ihe сиг- 
rent bus transaction. 
H = Nosnoopingcachehasa valid entry which matches the address of the current bus transaction. 
L = Oneormore snooping caches has a valid entry which matches the address of the current bus 
transaction. 
SRAM output enable. As an output this signal controls the pipelined output enable of external cache SRAM. 
it is used as an input to prevent bus collisions. 
Н = SRAM outputs disabled 
L = SRAM Outputs enabled 


IPCLK — | 1! | v34 |Processor clock. is the same clock as to the processor. 
t These pins have internal holding drivers. 


+ These pins have internal риНир resistors. 
5 These pins have an open drain. 
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Table D—1. Рт Functions — MBus Configuration (MBSEL = H) (Continued) 


Pending. A store is pending in the MXCC or on the MBus. This signal is asserted by the MXCC when it has 
a store operation pending internally or on the system bus. This signal indicates that at least one outstanding 
write operation has па! completed. 

Н = All write operations issued by this processor are completed. 

L = Oneormore write operations that were issued by this processor are not yet complete. 


PLL bypass. This pin is used to bypass both of the intemal phase lock loops. When PLLBYP is asserted, 
PCLK directly supplies timing for the circuits in the MXCC's processor clock domain, and BCLK directly sup- 
plies timing for the circuits of the MXCC's bus clock domain. The normal delay compensation perlormed by 
the PLL is defeated. 

H - PLLsare enabled. Normal operation. 

L = PLLsare disabled. No clock delay compensation. 


Capacitor for ihe phase filter ofthe processor clock PLL. This pin should be connected to an extemal capaci- 
tor to ground. With an internal resistor, this circuit provides the RC time constant for the phase fitter of the 
processor clock domain PLL. 


This signal is asserted when a read address is on ADDR35 - ADDRO. Also asserted with DEMAP to indicate 
completion of a bus demap operation by the processor. 
H « Noread. 
L = With DEMAP: demap operation requested by the MXCC is complete. 
Without DEMAP: a data read request. 
With ГОТ and WR: an atomic load/store operation. 


Reset. MXCC output used to reset the processor when the system asserts ASTIN. 


H = Normaloperation. 
і = Reset о processor. 


Retry. This signal is encoded, along with FHDY ог WRDY, and MEXC to indicate the type of 
acknowledgment. See MEXC description tor table. If this signal is asserted betore RADY ог WADY is as- 
serted for an access, the processor should terminate the current access and restart il once it reacquires the 
Vbus (if a processor read is pending, a processor write will not be retried until after ihe read has completed). 
Read grant. This signal grants the processor read access on the VBus. 

Н = Processor поі allowed read access. 

L = Processor may make read accesses. 


Read ready. This signal indicates that read data is valid. When RADY is asserted, the processor may reliably 
sample the incoming data on the same clock edge as RADY. This signal is used to qualify data specifically 
for a read access since a write may also be pending. This signal is encoded with MEXC and RETRY. See 


Reset in. Reset from the system to the cache controller. 
Н = Normal operation. 
L = Hardware reset (see reset section). 


Size of data transfer. These bits indicate the transfer size of the current bus transaction initiated by the 
processor. 

$12Е1 -SIZEO = 00 for byte transfer 

SIZE1-SIZEO = 01 for halfword transfer 

512Е1 -512Е0 = 10 for word transfer 

SIZE1-SIZEO = 11 lor doubleword transter 


t These pins have internal holding drivers. 
t These pins have internal pullup resistors. 
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Table D—1. Pin Functions — MBus Configuration (MBSEL = H) (Continued) 


Supervisor access. This signal is asserted by the processor with CMDS when the access was initiated in 
Supervisor mode. 

Н = User (unprivileged) transaction. 

L = Supervisor (privileged) transaction. 
Synchronous clocks. When this signal is asserted, the synchronizers are bypassed, eliminating their delay 
but requiring that BCLK and PCLK be identical. 

H = Asynchronous. PCLK and BCLK may have dillerent rates. 

L » Synchronous. PCLK and BCLK must be identical. 


тк [y | ^s лано 
m — -[ 5 [| We e T —————————————— 
Ho Го [ ADe | JTAG ва data ouput or PLL output Gao TEST bao 
ms ОСЕНИ ^ лава 
тест [# [ aba [mts — 


SRAM write enables. These signals directly contro! the write enable signals of synchronous SRAM used as 
external cache. These signals are driven only when asserted, otherwise they are in the high-impedance 
state. WEx bit ordering corresponds to the big-endian convention. That is: 


WEO: DATA63-DATAS6 : DATASS-DATA48 
WEZ: DATA47 -DATA40 WES: DATA39-DATA32 
WEZ: DATA31-DATA24 WES: DATA23-DATA16 
WEG: DATA15-DATAO8 УТЕ7: DATAO07-DATAO0 


Н = SRAMread 
L = SARAM write 


E-cache write enable enable. When asserted, the SSP may assert its write enables to write E-cache directly. 
This pin is used to control the assertion of processor's WE7 – WED signals. 

Н = The processor may not drive WE7 - WEO. 

L = The processor may drive WE7 - WEG, 


Write grant. This signal grants the processor write access on the VBus. 
H = The processor is not allowed write access. 
L = The processor may make write accesses. 


As an input, this signal is asserted with a write address on ADDR35 - ADDRO and write data on 
DATA63 - DATAQ. It is also asserted by the processor with DEMAP to send a demap request to the system 
bus. 

H s Nota write cycle. 

L = Write (or load/store with RD and LDST low or demap) cycle. 


As an output the MXCC asserts this signal with an address on ADDR35 ~ ADDRO to invalidate lines in the 
processor's internal cache(s) containing that address. 

H = Normal 

L = Demap 


t These pins have internal holding drivers. 
t These pins have internal pullup resistors. 
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Table D—1. Pin Functions — MBus Configuration (MBSEL = H) (Continued) 















: 


and RETRY. See MEXC description for table. 


mu — [ 3 | ата СТТН 
ш [+ | API6_|Notusedinthe Maus merce, O — — OOOO 
m — — | t 


Write ready. When WRDY is asserted, the MXCC has sampled the processor's write data, and so the proces- 
sor may generate the next access. In ihe case of burst writes, the processor switches address and data for 
the next write within the burst on the same clock edge as WRDY was asserted. This signal is used to qualify 
data specifically for a write access since a read may also be pending. This signal is encoded with MEXC 











| дио | 
| 
EIN | 
[AM СЕТ 
[AAS [Not used inthe ЕО 
| AB2 |NotusedintheMBusinterface, — | 


t These pins have internal holding drivers. 
$ These pins have intemal pullup resistors. 
§ These pins have an open drain. 
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Table D-2. Pin Functions — XBus Configuration (MBSEL = L) 


Capacitor for the phase fitter of the bus clock PLL. This pin should be connected to an extemal capacitor 
to ground. With an internal resistor, this circuit provides the RC time constant for the phase filter of the bus 
clock domain PLL. 


This signal indicales whether a burst access is in progress. BURST is driven at the same time as 
ADDR35 - ADDRO and 1 is asserted during both read bursts and write bursts. BURST is deasserted on the 
last address of a burst to allow the MXCC to stop returning RRDY or WRDY with the last data of the burst. 
H = Aburst access is in progress. 
L = A burst access is not in progress. 


Indicates either an internal MXCC error or ERROR is asserted by the processor 
Н = Noerror. 
L = Anintemal processor or MXCC error. 





t These pins have internal holding drivers. 
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Table D-2. Pin Functions — XBus Configuration (MBSEL = L) (Continued) 


Cacheable access. This pin indicates the current processor transaction as one that may be cached in an 
external cache. 

H = Noncacheable access. 

L = Cacheable access. 


Command strobe. Indicates the beginning of a bus cycle. The VBus master asserts this signal for one cyclo 
to begin all of its accesses. When the MXCC is a bus master, as indicated Бу ИЗАТ and FIGHT being 
deasserted, it asserts CMDS to initiate invalidate and demap transactions. 

H = Nota command word. 


L = VBus invalidate or demap command word оп ADDR35 - ADDRO, DEMAP, and WA. 


When the MXCC is not a bus master, this signal indicates the first cycle of a VBus transaction. 
H = Nota command word. 
L = VBus command word on ADDR35-ADDRO, CCHBL, CSA, DEMAP, LOST, SIZE1-SIZEO, 


Control-space access. The processor asserts this signal when performing a read or write to the internal tag 
RAM, E-cache, or registers of the MXCC. 





t These pins have internal holding drivers. 
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Table D-2. Pin Functions — XBus Configuration (MBSEL = L) (Continued) 


Processor data bus. 





t These pins have internal holding drivers. 
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Table D-2. Pin Functions — XBus Configuration (MBSEL - L) (Continued) 


Processor data bus (continued). 


As an input, this signal is asserted with a demap data word on DATA63-DATAO and WR asserted to pass 
а demap request from the processor to the MXCC, and then to the system bus. DEMAP asserted with AD 
asserted indicates that the processor has sucesstully completed a датар operation requested by the MXCC 
(initiated from the system bus). 

H = Nodemaprequest. 

і = When WA process has requested a demap cycle. 

When RD process has completed a system bus requested demap cycle. 

As an output, this signal is asserted when the system bus has requested a demap operation. 
DATA63-DATAO contains demap data word indicating which virtual address translations are to be dis- 
carded. 

Н = Nodemaprequest. 

L = System bus requested demap cycle. 


Data bus parity. When parity is enabled, even parity is generated and checked. ОРАО is parity for bits 
DATA63-DATAS6. When parity checking is disabled, odd parity is generated but not checked. 


DPARO: DATA63 - DATA56 DPAR1: ОАТА55 - DATA48 
DPAR2: DATA47 - ОАТА40 DPAR3: DATA39 - DATA32 
ОРАР4: DATA31 - DATA24 DPARS: DATA23 - DATA16 
DPAR6: ОАТА15 - DATA0OB ОРАЯ7: DATAO07 - DATAO0 


Processor error. The processor asserts this pin when it has entered an internal error state. The MXCC 
initiates an intemal reset when ERROR is asserted. 
H = Normai operation. 
L = Processor internal error. 
t These pins have internal пок по drivers. 
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Table D~2. Pin Functions — XBus Configuration (MBSEL = L) (Continued) 
| PINNAME | VO |PINNO. 


XBus ievel reference for СТІ and GTL/TTL selection. Should be connected to a voltage source of Vret 
GTLREF for GTL operation of the XBus interface signals. Should be connected to Усс for TTL operation of the 
XBus interface signals. Since this pin (and GTLREF1) sets threshold levels, care should be taken to 


insure that Vret is free of noise. GTLREF and GTLREF1 are connected together internally. 


lemmm |1 | л | иу reference for GTL and GTL/TTL selection. GTLREF and GTLREF1 are connected together 


Interruptrequest Level. This field specifies, to the processor, the level of the highest priority interrupt request 
AB28 | that is currently pending. If IRL3 ~ IRLO = 0000, no intemupts are pending. 
Level 15: (IRL3 ~ IRLO = 1111): Nonmaskable interrupt. 
Level 14: Highest maskable interrupt. 
Level 1: Lowes! maskable interrupt. 
Level 0: No interrupts are pending. 


Boot-bus command bits. Commands are issued by the МХСС and interpreted by one or more external Boot 
Bus controllers. 
LCMD2 LCMD1 LCMDO MEANING 

Address bits 23 - 16 on LDATA 
Interrupt Status on LDATA 
Address bits 15 - 8on LDATA 
Address bits 7 - 0 on LDATA 
Idle for write 

READ-VALID Device data on LDATA 

WRITE-VALID МХСС data on LDATA 

IDLE Idle 


Boot bus command strobe. When asserted, this signal indicates that command information on LCMD (and 
write data on LDATA for WRITE-VALID comands) is valid. Input data is latched on the rising edge. 

H = Inactive. 

L = Buscommand valid. 


H 
L 
H 
L 
H 
L 
H 
L 


Boot-bus address/data. 


This signal indicates an atomic load/store (LDSTUB, LDSTUBA, SWAP, Of SWAPA) operation. Н is 


equivalentto the logical OR of RD and WR signals. No other transactions may occur while LOST is asserted. 
Н = NOoLDST. 
L = Atomic load/store (LDST) cycle. 
MBus select. This signalis used to select the system bus interface. This signal should not be changed during 
operation of this device. 
H = MBussystem interface. 
L » XBus system interface. 
t These pins have internal holding drivers. 
+ These pins have intemal pullup resistors. 
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Table D-2. Pin Functions — XBus Configuration (MBSEL = L) (Continued) 


Memory exception. This signal is asserted when the memory controller could not return or accept the re- 
quested data. This signal may cause the processor to take a memory exception trap. This signal is encoded 
with RADY or WADY, and RETRY to indicate the type of acknowledgment. 

МЕХС RRDYAVRDY RETHY 


rf Steer = 
"гг Ч ад 


SRAM output enable. As an output, this signal controls the pipelined output enable of external cache SRAM. 
it is used as an input to prevent bus collisions. 

H = SRAM outputs disabled. 

L = SRAM outputs enabled. 


Pending. A store is pending in the MXCC or in ihe system beyond the MXCC. This signal is asserted by the 
MXCC when it has a store operation pending internally or on the system bus. This signal indicates that at 


H x No incomplete write operations outstanding from this processor. 
L = Опе ог тоге write operations issued by this processor are not yet complete. 


PLL bypass. This pin is used to bypass both ofthe internal phase lock loop. When PLLBYP is asserted PCLK 
direcity supplies timing for the circuits in the MXCC's processor clock domain, and BCLK directly supplies 
timing for the circuits of the MXCC's bus clock domain. The normal delay compensation performed by the 
PLL is defeated. 

H = PLLs are enabled. Normal operation. 

L = PLLsare disabled. No clock delay compensation. 


Capacitor for the phase filter of the processor clock PLL. This pin should be connected to an extemal capaci- 
tor to ground. With an internal resistor, this circuit provides the RC time constant for the phase filter of the 


This signalis asserted when a read address is on ADDR35 - ADDRO. Also asserted with DEMAP to indicate 
completion of a bus demap operation by the processor. 
H = Noread. 
L = With DEMAP: demap operation requested by the MXCC is complete. 
Without DEMAP: a data read request. 
wah ГОТ and WR: an atomic load/store operation. 


Reset. This MXCC output is used to reset the processor when the system asserts ASTIN. 
Н = Normal operation. 
L = Reset to processor. 


Retry. This signal is encoded, along with NADY or WRDY, and MEXC io indicate the type ol 
acknowledgment. See MEXT description for table. (f this signal is asserted before RADY or WADY is as- 
serted for an access, the processor should terminate the current access and restart it once it reacquires the 
Vbus (i! a processor read is pending, a processor write will not be retried unti! after the read has completed) 
Read grant. This signal grants the processor read access on the VBus. 

H = Processor not allowed read access. 

L = Processor may make read accesses. 





t These pins have internal holding drivers. 
t These pins have internal pullup resistors. 
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Table D-2, Pin Functions — XBus Configuration (MBSEL = L) (Continued) 
| PINNAME | VO |PINNO. | 


Readready. This signal indicates that read data is valid. When RADY is asserted, the processor may reliably 
Ati | Sample the incoming data on the same clock edge as RADY. This signal is used to quality data specifically 
for a read access since a write may also be pending. This signal is encoded with MEXC and RETRY. See 


MEXC description for table. 


Reset in. Reset from the system to the cache controller. 
Ё АЕ? Н = Normal operation. 
L = Hardware reset (see resel section). 


Size of data transler. These bits indicate the transfer size of the current bus transaction initiated by the 


SIZE1- SIZEO- 00 for byte transfer 
SIZE1- SIZEOs 01 for haltword transfer 
SIZE1- SIZEO- 10 for word transfer 
SIZE1- SIZEO« 11 for doubleword transfer 


Supervisor access. This signal is asserted by the processor with CMDS when the access was initiated in 
supervisor mode. 

H = User (unprivileged) transaction. 

L = Supervisor (privileged) transaction. 
Synchronous clocks. When this signal is asserted, the syachronizers are bypassed, eliminating their delay, 
but requiring that BCLK and PCLK be identical. 

H = Asynchronous. PCLK and BCLK may have different rates. 

L = Synchronous. PCLK and BCLK must be identical. 


Hox | [e ]mawsew. — 
ти [x [^ КО ТЕТЕ РОО 
mo [Го | Ab [лабы data сира or PLL oupa ТЕТ 
те | | Ads [зенә all ouput vers and montor Ро 
мо — [s [лт ТЕ Е SSCS 
тзт [е [Ам eaea O 


SRAM write enables. These signals directly control the write enable signals of synchronous SRAM used as 
external cache. These signals are driven only when asserted, otherwise they are in the high-impedance 
state. WEx bit ordering corresponds to the big-endian convention. That is: 


"МЕС: ОАТАВЗ - ОАТА56 : DATA55-DATA48 

WEZ: DATA47 - DATA40 WES: DATA39 - DATA32 
WE4: DATA31 - DATA24 WEBS: DATA23 - DATA16 
WEG: DATA15 - DATAOB УТЕ7: DATAO07 - DATAO0 


H = SRAMread 
L = SRAMwrite 


Write grant. This signal grants the processor write access on the VBus. 
H = The processor not allowed write access. 
L = The processor may make write accesses. 
f These pins have internal holding drivers. 
+ These pins have internal pullup resistors. 
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Table D-2. Pin Functions — XBus Configuration (MBSEL = L) (Continued) 


As an input, this signal is asserted with a write address on ADDR35 - ADDROJand write data on 
DATA63 - ОАТАО. It is also asserted by the processor with DEMAP to send a demap request to the system 
bus. 

Н = Nota write cycle. 

L = Write (or load/store with RD and LOST low or demap) cycle. 


Аз an output, the MXCC asserts this signal with an address on ADDR35 - ADDRO0 to invalidate lines in the 
processors's intemal cache(s) containing that address. 

H = Normal. 

L = Demap. 


Write ready. When WADY is asserted, the MXCC has sampled the processor's write data, and so the proces- 
sor may generate the next access. In the case of burst writes, the processor switches address and dala for 
the next write within the burst on the same clock edge as WRDY was asserted. This signal is used to qualify 
data specifically for a write access since a read may also be pending. This signal is encoded with MEXC 
and НЕТНҮ. See MEXC description for table. 





t These pins have internal holding drivers. 
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Table D-2. Pin Functions — XBus Configuration (MBSEL = L) (Continued) 


XBus multiplexed command / data bus. ` 





3 These pins have an open drain. 
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Table D-2. Рт Functions — XBus Configuration (MBSEL = L) (Continued) 





XBus multiplexed command / data bus (continued). 











ro 


4 
L | ® | ди [оливин 
memp | © ЕСС 
mex | * | дла [Pannonos ____ | 
L | * | Ate | машта у ____ 


Parity bits. 
XPAR3 = Parity over XDATA63 - XDATA48, XPAR2 = Parity over XDATA47 – XDATA32 
ХРАА1 = Parity over XDATA31 - XDATA16, ХРАВО = Parity over XDATA15 - XDATAO 


акте | OF | ^ezo |а Grant tous Walcher 0(@WO. — —  — 
окт | Of | Ameo | ла Grant to Bus Watchers (@WH). _ — _ 
xum | oi | Анго [кв био масо ви 
акта | os | anor | хале Grant to Bus Watchers GW) 


t These pins have internal pullup resistors. 
$ In GTL operation, the ИО buffer is open-drain, while in TTL operation the ИО buffer is 3-state. 
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Table D-3. Pin Functions — Power Connections 











C15, C21, F10, F26, K6, K30, R3, A33, AAS, | | 

ААЗЗ, AF6, AF30, AK10, AK26, AN15, АМ21 | SUPPly voltage (Усс) for internal (core) logic. 
зз | т° ooo Supply voltage (Мос) for bus clock and PLL. 
an у Supply voltage (Усс) for processor clock and PLL. 


Y6, AG3, AG33, AK16, AK20, ALS, AL31, 
ГР О RAT nmm слама 
Ia | | C25, G7, NaS, AC3, AJ29, AN13 Supply voltage (Vcc) for inputs. 


19, C9, C27, E5, E31, F16, F20, J3, J33, 
74, 150. U35, W1, Y31 Supply voltage (Усс) for processor outputs. 


C17, C19, F8, F12, 224, F28, H6, H30, M6, 
M30, U3, U33, W3, W33, AD6, AD30, AH6, 
AH30, AKB, AK12, AK24, AK28, AN17, 

AN19 


Може | | | — — Гоол оголе clock and PL 
Узе, | | [з [об processor clock and PLL. 
lVau ___________| C13, 29, МЗ, АСЗЗ, AJ7, AN23 Ground for inputs. 

v [| 


A17, C7, C11, C25, C29, F6, F14, F18, F22, 
F30, G3, G33, L3, L33, P6, P30, U1, V6, Ground for processor outputs. 


































Ground for internal (core) logic. 











V30, W35, A830 


AB6, АЕЗ, АЕЗЗ, AJ3, AJ33, AK6, AK14, 
AK18, AK22, AK30, AN7, AN11, AN29, 
AR19, AN25 
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This appendix contains а зиттагу ofthe different SuperSPARC microproces- 
sor revisions in tabular form. 
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MultiCache Controller Revision Summary 





This appendix contains a summary of the MultiCache Controller revisions in 
tabular form. 
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abort: Toterminate the execution of an instruction once it has begun but be- 
foreithas completed. Aborted instructions do not make any program vis- 
ible changes to the state of the processor registers or to memory. An 
aborted instruction does not change any condition codes and cannot 
cause a trap. Instructions are aborted if an earlier instruction causes a 
trap or if the instruction was started speculatively under assumptions that 
proved false, such as an incorrectly predicted branch. 


ALU: Arithmetic Logical Unit. The logic block that computes a result from 
one or two operands for any arithmetic or logical operation. The Super- 
SPARC processor contains three ALUs to process ALUops. 


ALUop: An ALU operation. ALU operations are any instruction that com- 
putes integer arithmetic results, such as ADD, SUBcc, MULScc. Instruc- 
tions that compute logical results are also ALUops; examples include 
AND, ORcc. SETHI and shift instructions. 


anti-dependency: A condition that occurs when the output of one instruc- 
tion overwrites an input of an earlier instruction which still requires its in- 
put. Instructions which might yet trap and be retried require their inputs 
to remain unaltered. See also dependency. 


arbiter: Thatwhich controls access to a shared resource among several po- 
tential users. Generally, an entity that requires the resource mustrequest 
the right to use it from the arbiter. The arbiter grants access to requesters, 
choosing between them in some way such as priority order or fair alloca- 
tion to all requesters. Arbiters are most often associated with multiple- 
master busses. 
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ASI: Address Space Indicator. An ASI is logically appended to a processor 
generated logical address before it is sent to ihe MMU. The MMU inter- 
prets the ASI value to control mapping and protection. The ASI can also 
select special address spaces for diagnostic and control access to struc- 
tures internal to the processor (e.g., cache tags) or external to the pro- 
cessor (e.g., external cache control). 


atomic operations: Operations that perform a memory read and a memory 
write without allowing other access to the addressed location between 
the read and write portions of the operation. The SWAP and LDSTUB 
instructions perform atomic operations. 


bandwidth: The capacity of a data transmission medium expressed in- 
formation per time. Bandwidth is most often used here to describe the 
data carrying capacity of busses. The units of bandwidth are usually 
megabytes per second. For example, a 64-bit bus running a 50 MHz has 
a bandwidth of at most 400 MB/s. After overhead for arbitration, address 
cycles, etc. are subtracted, the available bandwidth might be only 250 
MB/s. 


BIST: Built-in Self-Test. Any intemal mechanism that can perform testing 
without an external test controller. The SuperSPARC processor and Mul- 
tiCache Controller have scan-based BIST. Scan-based BIST uses a 
pseudo-random pattem generator to provide patterns to the internal 
scan paths. The pattems stimulate logic on the chip; the results of each 
pattern's action on the logic are captured in registers and scanned into 
а signature analyzer. The BIST controller performs many generate, 
scan, stimulate, capture, and analyze cycles, and the signatures accu- 
mulate in the signature analyzer. When the BIST cycles are complete, 
the signature register contains a pattem that indicates probably good 
logicif the pattem matches a known good pattern for the device and revi- 
sion. 


boot mode: A special MMU bypass mode in which all instruction fetches 
and accesses through the instruction access АЗ (0x08 and 0x09) gen- 
erate the physical address by passing virtual address bits 27 through 0 
unaltered and setting the upper eight bits (bits 35 through 28) of the 
physical address to 1s (OxFF). Hardware reset enables this translation 
mode. 


7-2 Glossary 


Subject to Change Without Notice 





branch target queue: A FIFO buffer that holds instructions to be issued if 
the conditional DCTl in progress chooses to transfer control. By prefetch- 
ing the instructions at the target of the CTI before it is known if they will 
be needed, the SuperSPARC processor can execute control transfer 
instructions more quickly. 


breakpoint: А point to interrupt the execution of a program for debugging 
purposes. А breakpoint can be raised by an instruction (such as SIGM) 
placed into the program or by access to an address within the range of 
address breakpoint comparators. 


BRop: Abranchoperation. The branch operations are: BRicc, ЕВісс, JMPL, 
CALL, and RETT instructions. 


bubble: An instruction group containing no instructions. A bubble is most 
often the result of an empty instruction queue. The empty group pro- 
ceeds through the pipeline stages like any other group, but contains no 
instructions. A bubble cannot cause a trap, not even for an interrupt. 


buskeeper: A buffer whose input is connected to the bus and whose output 
connected to the bus via a relatively large resistor or other means to limit 
its output current. This circuit keeps the bus at the last driven value when 
itis not driven. Bus keepers prevent the bus signals from drifting to volt- 
ages near the threshold, which could generate noise on chips. 


busmaster: Theunitthat initiates atransaction on a bus. The master sends 
a request or command to the bus slave. 


bus slave: The unit addressed by the bus master in a transaction on a bus. 
The slave acts on the request or command and responds with data or an 
acknowledgement. 


bus watcher: A device or unit that interfaces between XBus and a system 
bus. Abus watcher (BW) must convert bus protocols as needed, manage 
communications over XBus with the MXCC, and manage communica- 
tions over the system bus with main memory, peripheral devices, and 
other processors. A BW must keep a duplicate set of E-cache tags, 
snoop transactions on the system bus, and notify the MXCC of any 
changes in cache sub-block's state based on the system bus's consis- 
tency algorithm. 
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cache: Asmall, fast memory close to the processor that can act as a surro- 
gate for some words of main memory. The cache establishes temporary 
associations between words in the cache memory and main memory ad- 
dresses. On a memory reference, the processor checks the cache 
memory for such an association, and, if there is one, the copy in cache 
memory is used instead of the one in main memory. Cache memories are 
usually managed automatically, with words being transferred from main 
memory into the cache memory without any program intervention. 


cacheability: A property of each memory access that determines whether 
the data will be stored in caches and should therefore interact with other 
caches to maintain cache consistency. Cacheability can be determined 
in several ways, but normal accesses when the MMU is enabled deter- 
mine cacheability as a property of the virtuai page table entry (PTE). 


cacheblock: Cacheddatasharing a single address tag. Block sizes in a Su- 
perSPARC system vary between 64 and 128 bytes, depending on the 
cache and the configuration. A block may have several sub-blocks, each 
of which may be separately valid or invalid. 


cache consistency: The state in which all caches and main memory have 
the same values for all valid copies of data. Cache memories function by 
making copies of data. The copies might become inconsistent with main 
memory or with each other, which could lead to unexpected or incorrect 
results. 


cache line: А term some authors prefer to cache block. Cache line and 
cache block are synonymous. This User's Guide uses “cache block." 


cacheset: Thegroupof cache blocks from which to choose when accessing 
data from some single address. If there is only one blockin which the data 
for the address may be found, the cache is called “direct mapped". If the 
data for the address might be found in any block in the cache, the cache 
is called "fully associative". If the data might be found in one of a small 
number (л) of blocks in the cache, the cache is called 'n-way set associa- 
five". Each address has a set of n cache locations with which it might be 
associated. The number of blocks in the cache is the product of the num- 
ber of sets times the associativity. 


cache sub-block: |f a cache block has more than one set of data storage 
locations with separate valid (or invalid) bits, each set of locations with 
its own valid bitis a sub-block. Sub-blocks allow the amount of data trans- 
ferred to or from memory to be smaller than the amount of cached data 
controlled by a single address tag. 
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cache tag: The address and other information associated with a word or 
group of words that form an entry in a cache memory. The address indi- 
cates the main memory address of the word(s) in the cache entry. Other 
tag information includes an indicator for valid or invalid entries. 


circuit-switched bus: А circuit-switched bus is busy from the beginning of 
a transaction until the transaction is complete. So, when a master uses 
the bus to request data from a slave, the bus is unavailable for other use 
until the slave delivers the data or an error. A circuit-switched bus, there- 
fore, delivers a lower bandwidth than a packet-switched bus in similar 
technology but does not require complicated interface controllers. 


clean: Aterm applied to a cache blockthat has not been altered since it was 
read from memory. See also dirty (clean is the opposite of dirty). 


context: A single hardware-supported address space. The MMU supports 
contexts as a way to map the separate address spaces and protections 
of different processes onto the memory provided in hardware. A context 
normally corresponds one-to-one with software processes. 


copy-back: Writing the contents of a dirty cache block or sub-block to 
memory. Copy-back is initiated when a dirty cache block is replaced or 
flushed. 


CPi: Cycles Per instruction. (Usually the average number of cycles per 
instruction.) Lower numbers yield higher performance. СР! is deter- 
mined by dividing the total number of cycles to run a program by the total 
number of instructions executed by the program. 


СТІ: Control Transfer Instruction. Any instruction that alters the normal se- 
quential execution of instructions. The SPARC СТЕ are: Bicc, РВісс, 
CALL, JMPL, RETT and Ticc instructions. Most SPARC СТ are actually 
DCTIs. 


CWP: Current Window Pointer. This register is defined in the SPARC Archi- 
tecture Manual. The CWP indicates which of the register windows sup- 
ported in a processor is currently accessed. A register specifier (either 
а source or a destination register specifier) in an instruction selects one 
of the 24 registers іп a register window or one of the 8 global registers. 


cycle: Clockcycle. The period from one active edge of the clock signal to the 
next occurrence of the same edge. A cycle is the basic unit of timing pro- 


cessor operations, as most internal operations are performed in integral 
numbers of cycles. See also phase. 
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data forwarding: Sending the result of one instruction directly to the execu- 
tion of another instruction without first storing it in a register. Data for- 
warding saves one cycle in cases where the second instruction must wait 
for the result of the first. This is particularly useful in a pipelined proces- 
sor. 


ОСТІ: Delayed control transfer instruction. А ОСТІ changes execution to a 
new non-sequential location, after first executing one instruction sequen- 
tially after the DCTI. Most SPARC CTI instructions are DCTIs. All Bicc 
and FBfcc instructions, except for BA,a and FBA,a, are delayed. CALL 
and JMPL instructions are also delayed. Ticc is not delayed. 


demand fetch: A block read into the instruction cache performed because 
an instruction in the blockis needed immediately for execution. The pro- 
cessor cannot continue until the demand fetch is satisfied. 


demand miss: Ablockread from memory into the data cache performed be- 
cause a load instruction failed to find an association in the data cache for 
the address being accessed. The processor will wait on the memory read 
before continuing. 


demap: Ап operation that removes one or more address translations from 
the TLB. In XBus configurations, demap operations are also propagated 
to and received from the system bus. 


denormalized: A floating point number with the smallest exponent of its 
floating point format. Since the exponent is already the smallest value 
available, the exponent cannot be adjusted to the normal format where 
the value in the fraction has its most significant 1 in the bit position just 
beyond the most significant bit of the fraction. Since every normalized 
value has a 1 in this position, it is omitted from the normalized number 
since its value of 1 is implied. 


dependency: The requirement to use a value or resource from an earlier 
instruction. A value dependency is the use of a register source computed 
in an earlier instruction. If the value has not yet been computed, the later 
instruction must wait for the dependent data. Similarly, an instruction 
may require access to a resource being used by another instruction and 
must wait for the earlier instruction to finish. See also anti-dependency. 
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dirty: A cache blockis dirty if it has been altered in the cache by a store. Dirty 
is the opposite of clean. Dirty cache blocks need to be copied back to 
memory if the block is replaced or flushed. 


dynamic grouping: Selecting the instructions to run simultaneously as a 
group, by examining the next few instructions available. This can be con- 
trasted against static grouping, in which instruction groups are marked 
by the compiler or programmer. 


Ecache: Extemal Cache. An external cache memory supplements internal 
instruction and data caches and provides a much larger cache using sev- 
eral external SRAMs. 


endian: The way in which the constituent parts of a whole data structure are 
organized. The big endian organization places the first byte of a word in 
the highest order part of the word, while the little endian organization 
places the first byte of a word in the lowest order part of the word. The 
SPARC architecture version 8 is a big endian architecture. 


error: Ahardware-detected fault that requires software intervention. An er- 
ror is frequently not recoverable and may require that the selected pro- 
cess be terminated. An example of an error is a bus timeout. 


error mode: А SPARC processor enters error mode if it encounters a trap- 
ping exception or error while the PSR's enable traps (ET) flag is off. In 
error mode, the processor responds only to reset. 


exception: A condition that requires software intervention. An exception is 
frequently recoverable to allow the program to continue after a handler 
has intervened. An example of an exception is when the MMU detects 
a missing page translation for a page that is on disk rather than in 
memory (page fault). 


exclusive: Acache biockis in an exclusive state if no other cache in the sys- 
tem has a copy of it. Exclusive is the opposite of shared. 
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fault: Ahardwareor software failure that may be manifested as an error. The 
fault is what is broken (ог example an open wire or a shorted transistor) 
while the error is the incorrect operation or result caused by the fault. See 
error. 


fcc: Floating point condition code. A two-bit field in the floating-point state 
register (FSR) which encodes the results of floating-point compare 
(FCMP) instructions. Floating-point branch on floating-point condition 
code (FBfcc) instructions test the fcc field to decide whether to take a 
conditiona! branch. The two bits encode equal, greater, less, or unor- 
dered. 


flush: Toremove from a cache in such a way as to cause dirty data to be co- 
pied backto memory. SuperSPARC has no direct way for software to in- 
voke a flush of the data cache. 


forwarding: See data forwarding. 


FPev: Floating Point Event. FPevs are integer instructions that interact with 
the FPU. Loads into floating-point registers, stores from floating-point 
registers, and branches based on floating-point condition codes are all 
FPevs. 


FPop: Floating-Point Operation. An operation that takes operands only from 
the floating-point registers and places results in the floating-point regis- 
ters. FADDS (Floating-Point Add Single) is an example of an FPop. 


FPU: Floating Point Unit. The part of the processor that computes floating- 
point results. К also contains the floating-point registers. The FPU 
executes all FPops and participates with the IU in FPevs. 


global register: Integer registers accessible from any register window. 
Global registers are the only general purpose integer registers in a 
SPARC processor that are not affected by changing the CWP. 


group: Seeinstruction group. 


GTL: Gunning Transceiver Logic. A high-performance, low-voltage signal- 
ling technology used on MXCC in XBus configurations. 
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Harvard architecture: An architecture named after the Harvard Mark-lll 
and Mark-IV computers which had separate memories for data and 
instructions. Today the term is most commonly used to describe comput- 
ers with separate instruction and data caches, even though those caches 
store information from a single memory address space. That is how this 
term is used in this document. 


hit: That which occurs when an access attempt to a cache memory finds an 
association for the address of the access. The data can be accessed im- 
mediately in the cache. 


hold: A pipeline hold is the same as a stall. 


7-9 


Subject to Change Without Notice 


7-10 





icc: Integer condition codes. A four-bit field in the PSR that contains the inte- 
ger condition code bits. The icc bits are updated by ALUop instructions - 
whose names end in "cc" (for example, ADDcc). ALUops without the “cc” 
suffix and other types of instructions do not modify the icc. The branch 
on integer condition codes (BRicc) instructions test the condition codes 
to decide whether to take a conditional branch. Extended arithmetic 
instructions (for example ADDXcc) use the carry condition code as an 
input. The four condition code bits are C (carry), Z (zero), N (negative) 
and V (overflow). 


in registers: The in registers of a register window are the same as the out 
registers of the caller's window. The in registers can be used to receive 
arguments from the caller and to retum results to the caller. 


in-order completion: A term applied to instructions that complete in the or- 
der in which they were issued. 


instruction group: A set of instructions from a program that are issued to- 
gether in a superscalar processor and which execute simultaneously. 


Instruction issue: A step inthe execution of an instruction where its execu- 
tion is begun. The point of issue is usually beyond instruction fetch and 
at the point where the instruction is decoded. An instruction must be is- 
sued to be executed; however, since some instructions may be aborted 
due to traps, not every instruction that is issued will complete execution. 


interrupt: A notice of an event, usually asynchronous and external to the 
processor, that requires attention of the processor. 


invalidate: An operation that removes an entry from a cache by marking the 
entry invalid. No datais transferred by an invalidate operation. An invali- 
dated entry is no longer accessible andis available for allocation to a new 
entry. 


IPC: Instructions per Cycle. The inverse of CPI. Higher numbers indicate 
higher performance. IPC is determined by dividing the total number of 
instructions executed to run a program by the total number of cycles to 
run the program. 


IU: integer Unit. In the SPARC architecture, the integer controls program 
execution, initiates memory operations, and computes integer results. 
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JTAG: Joint Test Access Group, IEEE 1149.1, serial scan test access facili- 
ties. The JTAG test access port gives a five wire test connection to many 
integrated circuits to facilitate board and system testing. 


keeper: See bus keeper. 


latency: The time from starting an operation until the result is available for 
use by another instruction. See also throughput. 


local register: The registers in a register window that are not shared with 
either of the adjacent register windows and are, hence, local to the win- 
dow. These registers can be used as private registers by a procedure. 


lock(in cache): Aterm applied to cache blocks that can бе locked in cache. 
Locked entries are never selected for replacement, so a locked block re- 
mains in the cache. 


locked operation: On a bus, any sequence of bus accesses performed in 
such a way as to prevent any other bus master from accessing the bus 
during the sequence of accesses. The SPARC architecture has several 
atomic operations that may be implemented using locked bus opera- 
tions. Each of the SPARC atomic operations requires one read and one 
write to be performed on the addressed location. 
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MEMop: Memory operation. Load and store instructions, including atomic 
load/store instructions, are MEMops. 


memory reference: An access to cache memory or main memory by a load 
or store instruction. 


message: See packet. 


miss: An attempt to access data in a cache that fails because there is no 
association between the address of the access and a valid entry in the 
cache. A load or fetch miss normally initiates an automatic read of the 
requested memory from main memory or the next level of cache. Once 
the read completes, the data may be accessed. 


MMU: Memory Management Unit. A unit of a processor that performs func- 
tions related to memory address translation and protection. 


multiprocessing: Using more than one processor in a computer system. 
See also shared memory multiprocessing. 


normalized: Cast into normai form. Floating-point numbers are usually 
normalized (see denormalized). The normalized format requires that 
the exponent be adjusted until the fraction can be represented with an 
implied 1 in the bit position beyond the highest order bit of the fraction. 


nPC: Next Program Counter (PC). The address of the next instruction to 
execute. In an architecture with delayed contro! transfer instructions, a 
CTI changes the next value of nPC rather than the next value of PC. It 
is necessary for both PC and nPC to be saved on a trap or interrupt be- 
cause, during the execution of the delay instruction after a DCTI, the nPC 
has the address of the destination of the control transfer. 


out registers: The registers of the register window that are the same as the 
in registers of a called routine after executing a SAVE. They may be used 
to pass arguments to a called routine and to receive returned values from 
the called routine. 
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packet: A message consisting of a header and sometimes data. A packet 
may be either a request packet or a reply packet. A request packet is sent 
to a bus slave. A reply packet is returned later. 


packet header: The portion of a packet that contains addressing informa- 
tion, commands and other control information. The remainder of the 
packet contains its payload of data. 


packet-switched bus: That which communicates messages containing re- 
quests and replies. The messages are called packets. The bus is free 
for other messages during the time between a request and its reply. A 
packet-switched bus requires more complicated interface controllers 
than the more widely used circult-switched busses. 


page: Thesmallest unit of memory address translation. A single entry in the 
TLB or page tables maps a range of addresses called a page. In the 
SPARC reference MMU, a page is 4KB. 


page table: A collection of virtual to physical page address translations, 
each of which is called a page table entry (PTE). The SPARC Reference 
MMU uses page tables organized into a tree data structure in order to 
save space since most of the address space for a process is never valid. 


pipelining: A technique for increasing the throughput of a processor by di- 
viding the processing of every instruction into a few steps. The steps are 
then overlapped so that the first step of an instruction is processed atthe 
same time as the second step of the previous instruction and the third 
step of the instruction before that and so on. 


pipeline stage: Processor logic bounded by registers on each side and cor- 
responding to one of the steps in the execution of an instruction. 


phase: Clock phase, or one-half of a clock cycle. The phase corresponds 
either to the time when the clock is high or when it is low. Many actions 
in the SuperSPARC processor occur in particular clock phases. 


physical address: An address used to access memory and ИО devices on 
the bus. The physical address is generated by the MMU from the virtual 
address calculated by the processor. 

physically addressed cache: A cache memory where the address tag of 
words in the cache are physical rather than virtual addresses. 
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prefetch: A memory access performed because it is likely to be used in the 
near future. A prefetch may be either an instruction prefetch or a data 
prefetch. Prefetching is generally autonomous to program execution, al- 
lowing the processor to continue while the prefetch is performed. 


prefetch queue: A FIFO buffer that holds instructions from the instruction 
cache before they are issued. 


privileged instruction: That which generally accesses features that could 
compromise system security or reliability, and so are restricted to trusted 
supervisory software. The execution of a privileged instruction is re- 
stricted to supervisor mode. 


privileged mode: See supervisor mode. 


program order: The order in which the instructions in a program would be 
executed on a strictly sequential processor which completed each 
instruction before fetching the next. Such implementation techniques as 
pipelining, superscalar execution, speculative execution, and branch 
prediction all execute atleast some portion of most instructions in orders 
other than program order. Within a processor, these variations from pro- 
gram order cannot be observed by the program, but I/O, memory and 
other processors may see some variations in order. SPARC's memory 
models allow the selection of the types of variations from program order 
that the program can tolerate. 


PSR: ProcessorState Register. This register is defined as part ofthe SPARC 
architecture. It has a number of bits that show program execution status 
and others that control program execution. 


PSO: Partial Store Ordering. One of the memory models selectable in a 
SPARC processor. In PSO, even though store operations are issued in 
program order, they may not complete in program order. if the program 
requires that some pair of store operations complete in program order, 
a special store barrier instruction (STBAR) should be placed between 
them. PSO is the highest performing of the SPARC memory models. 

PTE: Page Table Entry. An entry in the page tables that contains a final ad- 


dress translation with permissions, as opposed to a PTP that contains 
only the address of another part of the page tables. 


PTP: Page Table Pointer. An entry in the page tables that contains the ad- 
dress of the next portion of the tree-structured page tables to consult, 
rather than the final address translation and permissions as in a PTE. 


pure consistency operation: Abus operation that transters no data but aft- 
fects the state of a cached block. A pure consistency operation is only 
needed to carry out the cache consistency algorithm, not to supply data. 
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queue: Any buffer that stores entries on a first-in first-out (FIFO) basis. Ex- 
amples are the SuperSPARC processor's store buffer and prefetch bufi- 
er. 


read ports: The number of interfaces available for reading, and hence the 
number of concurrent read operations that can be supported on a 
memory or register file. 


region: Oneofthelarge mapping sizes supported by the SPARC Reference 
MMU. A region is a 16 Mbytes of contiguous memory and is mapped by 
a single PTE in the level 1 page table. 


replacement: A term applied to an entry in a TLB or cache memory when 
it is replaced when needed for another entry that must be brought into 
the TLB or cache. The choice of which entry to replace is called the re- 
placement algorithm. if the entry being replaced is dirty, it must also be 
copied back. 


reset: An operation that restores certain parts of the state of a device. On 
the SuperSPARC processor, a reset operation sets just enough state for 
the device to enter the reset handler reliably from any state. The reset 
operation can be initiated by an extemal signal, by the JTAG TAP control- 
ler, by error mode, or by completion of BIST. 


segment: One ofthe large mapping sizes supported by the SPARC Refer- 
ence MMU. A segment is 256 Kbytes of contiguous memory mapped by 
a single PTE in the level-2 page table. 


setassociativity: n-waysetassociativity. A term applied to a cache in which 
the data for some address might be found in one of a small number (n) 
of blocks in the cache. Each address has a set of n cache locations with 
which it can be associated. The number of blocks in the cache is deter- 
mined by multiplying the number of sets times the associativity. 


shared: Thatstate of acache blockin which one or more other caches in the 
system has a copy of it. Shared is the opposite of exclusive. 
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shared memory multiprocessing: Ashared memory multiprocessing 5у5- 
tem has more than one processor sharing the same main memory. The 
processors can address the same memory and expect to see a consis- 
tent time series of values in the memory locations. See cache consis- 
tency. 


snooping: Onabus, that quality of an interface that observes transactions 
for which it is neither the master nor the addressed slave. Snooping is 
an important part of many "NEN for maintaining cache consistency 
in bus-based systems. 


SO: Strong Ordering. One of the SPARC memory models. All load and store 
operations under SO complete globally in program order. SO is the low- 
est performing of the SPARC memory models. 


software pipelining: Spreading the execution of each loop iteration over 
several actual trips through the loop. 


SPARC: Scalable Processor Architecture. SPARC is an open processor ar- 
chitecture controlled by SPARC International, an industry consortium. It 
is defined in a published specification, the SPARC Architecture Manual. 
The architecture and specifications are versioned. The SuperSPARC 
processor is conformant to SPARC version 8. 


SPARC Reference MMU: The SPARC Reference MMU is a memory man- 
agement unit architecture specified in The SPARC Architecture Manual. 


speculative instruction issue: A term applied to an instruction that is is- 
sued speculatively before it is known whether program flow will reach the 
instruction. Any instruction issued speculatively may need to be aborted. 
In a sense, any instruction is issued speculatively if it is started before a 
previous instruction that might trap has completed without trapping. This 
term usually applies to instructions after a branch on either the taken or 
untaken path of program execution issued before the branch direction 
has been decided. 


SRAM: Static Random Access Memory. SRAM's low access time makes it 
a desirable memory technology for many high speed applications. 
Another plus is that reads do not destroy the data as in many RAM 
technologies, so that reading a bit does not require re-writing it. 


SRMMU: SPARC Reference MMU. See above. 
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Stall: A cycle in which no stage of the pipeline advances. No instruction in 
the pipeline proceeds beyond its current step. No traps (even interrupts) 
can occur, and no results are stored. See also bubble. 


store-buffer: AFIFO queue of memory store operations. Each entry has the 
data to be stored, the main memory address at which to store it, and such 
other information as the size of the store operation. The store buffer al- 
lows the store instruction initiating the store operation to complete while 
the store buffer manages the store operation as it gets sent to main 
memory. 


superscalar execution: That which occurs when a processor issues more 
than one instruction at a time from a single program. These simulta- 
neously issued instructions are executed concurrently. 


supervisor mode: The mode in which a SPARC processor is said to be 
executing when PSR.S-1. In supervisor mode, privileged instructions 
may be executed. Supervisor mode and user mode are complementary; 
the processor always executes instructions in one mode or the other. 
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TBR: Trap Base Register. This register is defined in The SPARC Architec- 
ture Manual. The trap base register locates the system trap table, which 
contains entires for the many trap types supported in the architecture. 


test access port (TAP) controller: An internal sequencer that manages ac- 
cess to all JTAG test data registers. 


three-state buffer: A three-state output buffer which may drive its output 
signal high or low, or may be in a high impedance state where it does not 
affect the output signal. Bi-directional pins usually have three-state buff- 
ers. 


throughput: The rate at which data processing operations can be per- 
formed. Throughputcan also be the reciprocal of the amount of time from 
starting an operation until another operation can be started. If the opera- 
tion is pipelined, this time can be much shorter than the latency of the op- 
eration. For example, a floating-point multiply may have a three cycle la- 
tency but a one cycle throughput. This means that the multiplier can pro- 
duce one product after three cycles and four products after six cycles. 
This multiplier can multiply to produce results at the peak rate of one ev- 
ery cycle. If itisoperating at50 MHz, its peak throughput is 50 million mul- 
tiplies per second. 


TLB; Translation Lookaside Buffer. The ТІВ is a cache of address transla- 
tions that is part of the MMU. itis associative, based on the page number 
portion of the virtual address. When an association is found in the TLB 
forthe page portion of the virtual address, the TLB produces the physical 
page number portion of the physical address and properties of the 
translation, including cacheability and access permissions. 


trap: A vectored transfer of control to supervisor software through the trap 
table. Traps are caused by enabled exceptions, errors, resets, or inter- 
rupts. 


TSO: Total Store Ordering. One of the SPARC memory models that allows 
store operations to be buffered and then completed in the system at a 
later time. in TSO, all store operations complete in the order in which is- 
sued, but load operations may complete before earlier stores, which 
might still be held in the store buffer. A load to an address that has a store 
pending in the store buffer will wait for that store to complete. The perfor- 
mance of TSO lies between that of SO and PSO. 
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usermode: The mode in which a SPARC processor is said to be executing 
when PSR.S=0. In user mode, any attempt to execute a privileged 
instruction generates a privileged instruction trap. User mode and su- 
pervisor mode are complementary; the processor always executes 
instructions in one mode or the other. 


virtual address: An address as generated by the processor for access to 
data and instructions. The virtual address is translated by the MMU into 
a physical address that is used to access memory and I/O devices. See 
also page. 


WIM: Window Invalid Mask. This register is defined as part of the SPARC ar- 
chitecture. It marks register windows that are invalid. SAVE or RE- 
STORE instructions that change the current window to an invalid window 
cause awindow overflow trap or a window, underflow trap, respectively. 


write-backcache: Acachein which store operations are performed on data 
without also updating memory. When a block in a write-back cache is re- 
placed, it must be copied backto memory. A blockthathas been modified 
in cache is called dirty. 


write ports: The number of interfaces available for writing, hence the num- 
ber of concurrent write operations that can be supported on a memory 
or register file. 


write-through cache: A cache in which store operations on data update 
both the cache and main memory. When a block in a write-through cache 
is replaced, it can be invalidated without the need to copy back to 
memory, since memory already has the new data value. Cache blocks 
in a write-through cache never become dirty. 
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aborted instruction, 5-2 
AC, 10-22 

bit, 10-37, 14-25 
ACC, field, 9-33 


access 
bus error code, 9-33 
exception, 5-5 
permission code, 9-4, 9-38 
to debug features, 15-6 
access exception, 5-5 
access type, 9-31, 9-33 
accrued exception field, 4-15 
ACTION ASI registers, 15-8 
action on breakpoint event register, 12-12 
ACTION register, 12-12, 15-3, 15-7, 15-9, 15-11 
ACTION.BCIPL, 12-12, 15-5 
ACTION.IEN_DBK, 15-5 
ACTION.MIX bit, 13-4, 13-5 
action-on-event register, 15-3 
ADD instruction, 5-23, 5-24 
additional integer, 5-6 
address 
breakpoint, 15-3 
facilities, 15-2 


dependencies, 5-6 
format, 10-15 
high byte command, 20-6 
low byte command, 20-6 
middle byte command, 20-6 
operands, 5-5, 5-6 
space, 2-2 

implicit alternate, 22-22 
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space identifier, 2-3, 2-6 
tag, 16-9 
translation, 9-3 
translation modes, 9-19 
addressing conventions, 2-5 
AE error, 16-43 
аехс, 4-15 
allocate policy, 17-13 
altered non-emulation PC pair, 22-18 
alternate 
address space, 22-22 
cacheable bit, 8-6, 9-23, 10-22, 10-37 
space atomics, 8-5 
ALU, 4-4, 5-5, 5-6, 6-19 
ancillary state register, 4-10 
application code, 21-2 
approximate latencies for each emulator primitive, 22-30 
arbiter, MBus, 17-2 
arbitration, VBus, 18-8 
arbitration priorities, 19-44 
architecture, 3-7 
arithmetic, 2-7 
arithmeticAogical/shift instruction, 2-5 
ASI, 2-3, B-1 
0x39, 13-8 
control spaces, 10-38 
memory references, 21-11 
operations, 8-6 
values, B-1 
ASR, 4-10 
assembly language, 21-2 
asynchronous scan interface, 3-5 
AT, 9-31 
atomic operations, 8-4, 10-35 


BA branching, 15-11 

BA instruction, 15-6 

backplane-level JTAG busmaster, 21-22 
bandwidth, 16-4 

basic instruction operations, 2-2 

BCIPL, 15-13 

BCLK, 17-42, 23-7 

BCLK and PCLK relationship, 23-7 
big-endian architecture, 2-5 
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binary floating-point arithmetic, 2-4 
BIST, 13-2, 13-7, 21-11 

ASI operation, 13-8 

coverage, 13-7 

JTAG-initiated, 21-11 

long version, 13-7 

mechanism, 21-11 

operation warnings, 13-9 

operations, 21-11 

register domain, 21-9 

register domains, 21-7 

Reset, 14-5 

sequencer, 21-9, 21-11 

short version, 13-7 

.SIGNATURE register, 13-7 

.STATUS, 13-8 
BKS.DBKIS bit, 15-5 
BKV, 15-8 
block 

number field, 16-17 

read, 8-10 

read facility, 16-30 

write, 16-30 
board-level JTAG busmaster, 21-22 
BootBus, 20-1 

address, 20-6 

address decoding, 20-5 

commands, 20-6 

controller, 20-6, 20-7 

example transactions, 20-9 

interface, 16-11 

introduction, 20-2 

signals, 20-3 

transactions, 20-6 
boot mode, 9-23, 13-4 
boot mode bit, 12-9, 13-6 
boot mode/local bus indicator, 17-10 
bootstrap loading, 20-1 
boundary scan, software-controlled, 21-2 
boundary scan map (Viking), 21-10 
branch, 5-19 

couple, 5-21 

direction, 6-2 

frequency, 6-2 

instruction, 22-16 

on floating-point condition codes instruction, 5-14 

performance, 6-2 
breakpoint 

ACTION register, 15-12 
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address compare mask, 22-25 
and counter-interrupt level, 12-12, 15-7 
code address, 22-25 
code and data address, 22-25 
control register, 15-8 
control register (BKC), 12-12 
control registers, 15-7 
data address 22-25 
hardware, 3-5 
mask register, 15-8 
status register, 15-9 
value register, 15-7 
BSCAN, 21-7, 21-10, 21-15 
BT, 9-24 
bubble, 6-10 
buffering, 3-4 
buffers and synchronizers, 16-11 
built-in self-test, 13-2, 13-7, 21-2 
burst mode access bit, 10-41 
burst read 
hit, 18-22 
miss, 18-24 
burst transaction, 8-10 
bus 
commands, 19-23 
data commands, 19-23 
tag commands, 19-32 
command logic, 16-9, 16-10, 16-11 
connection, 16-4 
cycle waveforms, 19-49 
error, 9-30, 12-15 
error response, 17-39, 17-47 
grant, 17-34 
keeper, 18-10, 18-18 
master, 17-1 
ownership, 19-46 
default grantee, 19-47 
no default grantee, 19-47 
protocol, 19-15 
cycles, 19-15 
packet detection, 19-16 
packets, 19-15 
transactions, 19-16 
slave, 17-1 
snooper, 17-1 
snooping, 3-5 
transaction, VBus, 18-6 
watcher, 16-3, 19-6, 19-46 
BYPASS, 21-7, 21-10 
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instruction, 21-14 

scan chain, 21-10 
byte 

access, 2-5, 13-8 

reference, 5-8 


C bit, 10-37, 14-25 
cache 
arrays, 13-7 
block, 16-14 
block shared, 17-8 
coherence protocols, 17-2, 17-3 
coherent support, 3-4 
column redundancy repair circuits, 13-5 
consistency, VBus, 18-6 
consistency protocol, 16-9, 17-16, 19-21 
consistency state, 17-18 
control registers, 2-6 
controller, 13-9, 16-9 
disable read, 18-27 
disable write or non-cacheable write, 18-35 
enable bit, 13-10, 17-41 
lookup, 5-5, 5-10 
miss, 5-5 
miss penalties, 6-2 
RAM, 3-5 
redundancy logic, 13-6 
size bit, 16-15 
split | & D, 2-7 
sub-block states, 19-21 
XBus configuration, 19-21 
cacheability, 10-7, 10-21 
cacheable 
bit, 10-41 
page, 9-4, 9-37 
single read hit, 18-19 
single read miss, 18-20 
write hit, 18-28 
cached entries, 3-7 
caches, 14-22 
caches/store buffer, 10-1 
CALL instruction, 5-23 
calis, 5-19 
CAPTURE operation, 21-5, , 21-6, 21-8, 21-11 
captured non-emulation PC pair, 22-18 
cascade 
conditions, 5-4 
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instruction group, 6-14 
CBFEN, 15-9 
CBKEN, 15-9 
CBKFS, 15-10 
CBKIS, 15-10 
CBKM, 22-8, 22-9 
CC error, 16-43 
CONT, 15-10 
CONTEN, 15-11 
CCOP field, 16-43, 16-44 
CE bit, 17-41 
cexc, 4-12, 4-15 
chip pins, 21-10 
Cl 
operation, 17-44 
transaction, 17-14, 17-18, 17-25, 17-26, 17-28, 17-36, 17-37, 17-41, 17-44, 17-45 
CID, 21-7 
CID primary register’s scan chain, 21-10 
Circuit-switched bus, 19-4 
clean stable clock sources, 23-6 
CLK, 17-13 
clock 
clocking, 23-1 
delayed, 2-2 
domain, 16-11 
duty cycle, 23-6 
external, 23-2 
feedback loop, PLL, 23-2 
input, 23-1, 23-2, 23-6 
PLL, 23-3, 23-4 
jitter, 21-13 
pin, external, 23-2 
requirements, iv, 23-1 
routing, internal, 23-2 
signal, 23-2 
skew, system, 23-1 
sources, 23-6 
code 
address, 15-2 
and data address breakpoints, 22-25 
breakpoints, 22-25 
generated breakpoint, 12-12 
generation, 6-6 
generation issues, 6-7 
performance, 6-3 
profiling, 15-2 
coherence algorithms, 3-5 
coherent 
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cache multiprocessing, 17-2 
invalidate transaction, 17-14 
read and invalidate transaction, 17-14 
read transaction, 17-14 
write transaction, 17-14 
command 
logic, 16-9 
word signals, MBus, 17-9 
common 
emulator 
functions, 22-19 
primitives, 22-19 
level-two 
TDI, 21-22 
TDO, 21-22 
component iD, 21-10, 21-15 
compound emuiation protocol commands, 22-17 
compound MCMD mode, 22-18 
condition codes, 4-4, 5-6 
conditional branches, 5-6, 5-18 
configuration space, 16-29 
context numbers, 9-3 
context register, 9-3, 9-24 
context table pointer register, 9-24 
control register, 9-22, 14-21 
control space, 16-17, 16-21 
access, 16-12 
access error, 9-27, 9-30 
annulled, 2-7 
control transfer, 2-5, 5-4 
instruction couple, 2-7, 5-21 
instructions, 22-16 
mechanism, 5-19 
operation, 12-15 
delayed, 4-9 
coprocessor, 2-2 
-disabled trap, 12-14 
instructions, 4-5 
interface, 12-14 
operate, 2-5 
support, 2-2 
SPARC architecture, 2-4-2-5 
copy-back 
cache, 5-6 
data, 10-35 
mode, 10-21 
policy, 5-6 
write-allocate caching policy, 10-35 
copy-out, 12-12 
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counter, 21-2 
cycle, 3-5 
instruction, 3-5 
breakpoint, 15-3, 15-7 
control, 15-11 
registers, 12-12 
status, 15-11 
value, 15-10 
-generated breakpoint, 12-12 
CP, 2-2, 2-3, 2-4 
-disabled trap, 4-5, 12-14 
error, 16-43, 16-44 
CP. instruction trap, 12-14 
CPI, 3-7 
CR 
operation, 17-23, 17-25, 17-38 
transaction, 17-8, 17-14, 17-15, 17-16, 17-23, 17-26, 17-27, 17-35, 17-36, 17-37, 17-43, 17-44, 17-45 
CRI, 17-28 
transaction, 17-8, 17-14, 17-15, 17-25, 17-26, 17-35, 17-36, 17-37, 17-41, 17-43, 17-45 
CS bit, 16-15 
CSPACE, 15-9 
CTI, 5-21, 6-22, 10-11 
CTRC, 15-11 
CTRS, 15-11 
CTRY, 15-10 
CTRV.CCNT, 15-11 
CTRV.ICNT, 15-11 
current 
exception field, 4-12 
window pointer, 2-3, 2-11 
CWP, 2-3, 4-3, 4-5, 4-6, 5-23, 5-24, 12-14 
change, 5-23 
pipeline, 5-23 
value, 5-23 
cycle 
count underflow, 15-2 
counter, 3-5 
breakpoint, 22-28 
interrupt, 15-12 
cycles, 19-15 


d register addresses, FPU operation, 11-3 
daisy chains, paraliel, 21-22 
data 

register dependencies, 6-2 

access 
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errors, 9-27 
exception trap, 12-15 
address, 15-2, 22-9, 22-25 
cache, 5-5, 5-6, 5-11, 10-26, 13-5, 14-12 
consistency, 10-26 
control, 10-29 
data, 10-33 
enable, 9-24 
flash clear, 10-29 
diagnostics, 10-29 
high integration, Viking introduction, 3-4 
Support routines, 14-17 
tags, 10-30 
commands, 19-23 
formats, 2-4 
forwarding, 5-6 
forwarding, 6-13 
-generated breakpoint, 12-12 
operands, 5-5 
Store error handler code, 10-37 
store error trap, 12-13 
data store error, 12-13 
datascan instruction, JTAG instruction register, 21-16 
DBFEN, 15-9 
DBKFS, 15-10 
DBKIS, 15-10 
DBKM, 22-8, 22-9 
DBREN, 15-9 
DBWEN, 15-9 
debug, 15-2 
analysis, 3-5 
interrupt, 15-2 
default grantee, 19-46, 19-47 
deferred floating-point traps, 5-13, 12-15 
delay 
group of a branch, 6-22 
instruction, 4-9 
instructions allocation, 6-7 
delayed control transfer, 2-2, 4-9 
demand fetch, 10-12 
DeMap operation, 8-8 
dependent FPop, 5-14 
destination 
address, 16-30 
data formats, 11-9 
field, 2-5 
register, 2-5, 6-13 
device manufacturing correctness, 21-2 
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diagnostic 
emulation, 15-1, 22-1 
operation (emulation), 15-1, 22-1 
registers, 13-B, 15-7 
dirty bit, 10-32 
divide, 5-14 
divide by zero trap mask, 4-13 
double floating-point registers, 6-4 
double-precision 
dividend, 4-8 
operand, 4-11 
value, 2-3 
values, 2-3 
double-word, 2-5, 13-8, 16-5 
load, 4-2 
reads, 22-22 
reference, 5-8 
references, 12-15 
size, 22-25 
drain pointer, 10-43 
DRCAPTURE, 21-6, 21-10 
DRSHIFT, 21-10, 21-13 
DRUPDATE, 21-10 
dual-port register files, 16-11 
duty cycle 
Clock, 23-6 
input, 23-6 


E-cache, 16-4, 16-14, 17-18, 17-41 
data access, 16-15 
tag entry, 16-18 

E1, 5-6 

EC, 4-5 

EC bit, 4-5 

ECC errors, 9-30 

ECC problem, 12-13 

ECHOTMR, 22-8 

EF bit, 2-4, 4-5 

eight-byte boundaries, 2-5 

eight-doubleword entry store buffer, 3-4 

emulation, 21-12 
command and instruction, 22-5 
counter status register, 15-12 
data memory reference instruction, 22-16 
data out, 22-7 
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entering, 22-14 
entry, 22-14 
execution, faults during, 22-15, 22-16 
exit PC/NPC registers, 22-12 
instruction, 21-2, 22-16, 22-17, 22-18, 22-30 
last, 22-18 
legal and illegal, 22-16 
previous, 22-18 
scan in, 22-30 
scan out, 22-30 
sequence, 22-12, 22-19, 22-20 
mode, 13-2, 22-11, 22-15, 22-16, 22-17, 22-18, 22-19, 22-25 
operation, 22-15, 22-20 
primitive, 22-30 
register domain, 21-7, 21-9 
registers, 22-4, 22-12 
remote, 22-18 
sequence, 22-17, 22-19 
session, 22-18 
first, 22-18 
second, 22-18 
Software, 22-23 
status, 22-8 
strategy, 22-2 
EN, 9-24 
enable 
coprocessor bit, 12-14 
floating-point bit, 2-4 
trap field, 5-25 
traps bit, 12-11 
end reset, 14-27 
entering emulation, details, 22-14 
entry number, 10-40, 10-42 


entry type, 9-5, 9-38 
EPROM, 20-1 
ERR field, 16-44 
ERRMODE, 22-10 
ERR, 16-43, 16-44 
ERR bit, 16-43 
ERRMODE bit, 22-10 
error 
code, 16-44 
handling, MXCC, 16-36 
mode, 9-27, 12-10, 22-16, 22-17, 22-19 
bit, 13-5, 13-6 
condition, 12-14 
reset, 14-4 
reset taken, 9-30 
register, MXCC, 16-42 
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reply, 16-44 
reporting, VBus, 18-6 
traps, 12-9 
ERROR1 
acknowledgement, 17-31 
response, 17-39, 17-47 
ERROR2 
acknowledgement, 17-28 
response, 17-39, 17-47 
ERRORS 
reply, 17-40, 17-47 
response, 17-39, 17-47 
ESB, 15-5 
ESB pin, 15-3, 15-13, 15-15, 22-26 
ET, 4-5 
ET bit, 12-9, 16-43, 16-44 
even-odd pair of registers, 2-3 
event-dependent bit, 15-5 
exception, 9-14, 12-7, 15-2 
handling, 5-24, 12-7 
mode, 4-16 
next program counter, 12-7 
pipeline, 5-24 
program counter, 12-7 
sources, 5-5 
and the pipeline, 5-24 
execution pipeline, blocking, 10-35 
execution stages, 5-5 
existing code performance, 6-3 
exit, 22-17 
extended cache, 16-14 
extended opcode space, implementation-dependent, 4-10 
external 
analysis equipment, 3-5 
bus 
error, 9-33 
request, 16-9 
cache, 3-4, 5-6, 16-4 
controller, iv, 16-1 
RAM, 16-4 
support, high integration, Viking introduction, 3-4 
tags, 16-17 
clock, 23-2 
decoder, 20-7 
device, 15-15 
interrupt requests, 12-11 
monitors, 15-14 
strobe pin, 15-3 
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FAR, 12-13 
fast load and store instructions, 3-7 
fault 
address register, 9-34, 12-13 
address valid bit, 9-34 
Status register, 9-34 


type, 9-32 
during emulation execution, 22-16 


faulty emulation status indication, 22-16 

FBicc, 4-14, 5-14, 5-16, 11-2 

FCmp instruction, 4-14, 5-14, 6-11 

FOmp-FBícc pair, 6-11 

FCMPE instruction, 4-14 

FE stage, 5-15 

feedback loop, clock, PLL, 23-2 

fetch, 5-4 

FIFO, 10-11 

FIFO queue, 3-4, 5-16, 16-11 

file write, integer register, 22-21 

fill pointer, 10-43 

filter circuit, recommended, 23-3, 23-4 

first emulation session, 22-18 

flash clear, 10-17 

floating-point, 5-16 

arithmetic 
binary, 2-4 
IEEE Standard 754, 2-2 
IEEE Standard 754-1985, 2-4 
floating-point arithmetic conditions, 12-15 
floating-point arithmetic functions, single- and double-precision, high-performance, 3-4 
floating-point arithmetic instructions, 5-16 
branches, 4-14 


control register, reading, 22-23 
data formats, 2-4 
-disabled trap, 12-14 
events, 5-13, 5-16 
exception, iv, 11-1 
details, 11-4 
exception trap, 12-15 
exception trap type, 4-14 
f registers, register summary, 4-11 
implementation, 3-8 
instruction, 2-4, 4-5, 11-2 
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most recently executed, 4-15 
instruction sets, 2-4 
LD/ST, 6-19 
load/store instructions, 2-4 
moves, 5-16 
operate, 2-4, 2-5 
instructions, SPARC architecture, 2-8 
operation scheduling, 6-2 
operations instruction, 5-6, 5-13, 5-16 
pipeline, 5-13 
queue, 4-16, 5-16, 12-15 
FPU operation, 11-8 
register summary, 4-16 
interface, iv, 11-1 
register, 2-2, 2-3, 5-13 
dependency, 6-8, 6-13 
read, 22-23, 22-25 
register file, 2-2 
state, 4-12, 22-25 
write, 22-24 
transfers, 3-7 
traps, deferred, 5-13 
unit (FPU), 2-2, 2-2, 2-3, 2-4, 3-4, 4-5, 5-6, 5-16, 14-27, 22-11 
accrued exception field, 4-15 
high integration, Viking introduction, 3-4 
operation, 11-1 
register, 2-3, 4-13, 5-16 
SPARC architecture, 2-3-2-4 
flow control, 19-45 
bus watchers, 19-46 
flow of execution, 22-16 
flush (IFLUSH), 7-5 
FLUSH instruction, 2-7, 10-13 
FMULS, 6-9 
format 1, A-4 
format 2, A-5 
format 3, A-7 
four-byte boundaries, 2-5 
FP 
data dependencies, 6-8 
exception, 22-11, 22-25 
pipeline, 5-6 
queue, 22-25 
register dependencies, 6-8 
fp-disabled trap, 4-5 
fp. exception trap, 4-12, 4-15 
FPev instruction, 5-13, 5-16 
FPop, 2-4, 5-13, 22-25 
dependent, 5-14 
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instruction, 4-12, 4-13, 5-16 
latency, 6-10 
FQ, 22-11, 22-25 
FQE, 22-11 
FRD stage, 5-15 
freq dependency, 6-12, 6-13 
FSR, 4-12, 4-16, 11-3, 22-25 
FSR.IMPL field, 11-3 
FSR.QNE, 5-16 
FSR.VER field, 11-3 
ftt, 4-14 
full testability, high integration, Viking introduction, 3-6 
futurebus, 17-2 
FWB stage, 6-13 


general-purpose 
integer registers, 2-2 
general-purpose programs, 2-3 
gradual uderfiow, 2-4 


һа! cache bit, 16-15 
halt-word, 2-5 
access, 2-5, 13-8 
reads, 22-22 
reference, 5-8, 12-15 
(16-bit) access, 2-5 
hardware 
-assisted software memory-scrubbing scheme, 16-30 
breakpoints, 3-5 
-interrupt requests, 12-11, 12-16 
replacement algorithm, 10-17, 10-33 
reset, 12-13, 13-2, 13-5, 13-6, 14-6, 22-9 
reset requirements, 13-5 
reset response, 13-5 
Isoftware page table consistency, 8-8 
use of page tables, 8-8 
Harvard architecture, 2-7 
HC bit, 16-15 
heinous modes, 13-6 
high byte, 20-6 
high integration, Viking introduction, 3-2 
high-order bits, 5-5 
high-performance floating-point arithmetic functions, 3-4 
highly dependent operation, 5-14 
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hit 
criteria, 9-11 
ratio, 16-23 


I&D memories, 2-7 
icc, 4-4 
icc.c, 4-5 
icc.n, 4-4 
icc.v, 4-4 
icc.2, 4-4 
icc-modifying instruction, 4-4 
ICNT, 15-10 
ICNTEN, 15-11 
IDIV, 5-17, 7-3 
WRPSR, 7-4 
idle command, 20-6 
IEN_CBK, 15-13 
IEN DBK, 15-13 
IEN ZCC, 15-13 
IEN ZIC, 15-13 
IFLUSH instruction, 7-5, 14-26, 15-6 
illegal 
instruction trap, 2-4, 4-2, 4-5, 12-14, 12-16 
instructions, 22-16 
opcode, 12-14 
operations, 22-17 
implementation number, 9-22, 22-19 
implementation-dependent, 2-3 
improperly aligned address, 2-5 
IMUL, 5-17, 7-2 
IMUL/DIV operations instructions, 5-16 
incoming processor commands, 16-9 
independent scan rings, 21-12 
inexact trap mask, 4-13 
infinitely exact result, 4-13 
inhibit transaction, 17-14 
initiating BIST, 13-7 
input 
clock, 23-2, 23-6 
duty cycle, 23-6 
queue, 16-10, 16-11 
input/output 
RDASR, SPARC architecture, 2-10 
SPARC architecture, 2-10 
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instruction 
FLUSH, 2-7 
Ticc, 2-11 
access errors, 9-26 
access exception trap, 12-13 
access faults, 9-34 
cache, 6-19, 10-5, 10-12, 10-13, 10-18, 13-5, 14-11 
high integration, Viking introduction, 3-4 
consistency, 10-13 
controls, 10-14 
data, 10-17 
diagnostics, 10-14 
enable, 9-24 
replacement policy, 10-9, 10-22 
support routines, 14-14 
tags, 10-15 
count underflow, 15-2 
counter, 3-5 
counter breakpoint, 22-28 
execution, 2-3 
fields, instruction summary, A-2 
formats, 2-2 
grouping, 6-16 
classes of rules 
exceptions, 6-16 
split atter, 6-16 
split before, 6-16 
operands, 2-3 
ordering, 6-2 
prefetching, 10-11, 10-12 
queue, 6-19, 10-11 
queue prefetch buffer, 5-4 
register, 21-8, 22-2 
sets, floating-point, 2-4 
space, RDASR-reserved, 4-10 
summary, instruction fields, A-2 


instruction access exception, 12-13 


instructions 

arithmetic, SPARC architecture, 2-7 

control transfer 
annulled, SPARC architecture, 2-7—2-8 
delayed, SPARC architecture, 2-7-2-8 
SPARC architecture, 2-7-2-8 

coprocessor operate, SPARC architecture, 2-8 

floating-point operate, SPARC architecture, 2-8 

flush (IFLUSH), 7-5 

integer divide (IDIV), 7-3 
write PSR (WRPSR), 7-4 

integer multiply (IMUL), 7-2 

load/store 
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alternate, 2-6-2-14 
SPARC architecture, 2-5-2-7 
logical, SPARC architecture, 2-7 
memory access, SPARC architecture, 2-5-2-7 
per cycle, 3-7, 6-3 
SETHI, SPARC architecture, 2-7 
shift, SPARC architecture, 2-7 
signal user emulation request (SIGM), 7-7 
SPARC architecture, 2-5 
State register access 
ancillary state, SPARC architecture, 2-8 
SPARC architecture, 2-8 
store barrier (STBAR), 7-6 
tagged arithmetic, 2-7 
that modify the condition codes, 4-4 
integer 
arithmetic units, 6-5 
integer 
divide, 7-3, 7-4, 11-7, 12-14 
by zero trap, 12-16 
instruction, 3-4, 12-14, 12-16 
integer 
LD/ST, 6-19 
load and store instructions, 2-5 
multiply, 4-8, 7-2, 12-14 
FPU operation, 11-7 
instructions, 3-4 
integer divide operations instructions, 5-16 
register, 22-20, 22-21 
file read, 22-20 
file write, 22-21 
state, 22-21 
state 
register, 22-21 
read, 22-21 
write, 22-21 
PSR, 22-21 
TBR, 22-21 
WIM, 22-21 
Y, 22-21 
unit, 2-2 
SPARC architecture, 2-3 
SPARC-compatible, 3-3 
r registers, 4-2 
interconnect testing, 21-2 
interface logic, 16-10 


internal 
BIST operation, 13-7 
cache, 5-6 
clock routing, 23-2 
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error code, 9-33 
interrupt sources, traps, 12-12 
logic, 23-2 
routing delay, 23-2 
sequencer, 21-4 
INTERNAL, SCAN, 21-13 
interrupt, 5-25 
request, 2-11 
command, 20-7 
generation register, 16-34 
latency, 12-11 
traps, 12-11 
levels, traps, 12-16 
mask register, 16-33 
pending 
clear register, 16-34 
register, 16-33 
request level, 5-25, 17-9 
interrupted program counter, 5-25 
introduction 
JTAG serial scan interface, 21-2 
MultiCache Controller, 16-2 
to SuperSPARC, 1-1, 1-2 
traps, 12-2 
VBus, 18-2 
intscan instruction, JTAG instruction register, 21-16 
invalid 
address error code, 9-32, 9-33 
operation 
exception, 4-13 
trap mask, 4-13 
register window, 12-14 
invalidate transaction, 17-14 
inverted parity, 13-9 
IPC, 3-7, 6-3 
IPND, 22-9 
IQ, 10-11 
IR, 21-8, 21-10, 22-4 
register, 22-4 
scan chain, 21-3 
scan ring, 21-8 
update register, 21-8 
IRL, 5-25 
IRUPDATE state, 21-8 
IU, 2-2, 2-3, 2-4, 5-5, 5-16 
pipeline, 22-6 
register, 2-3 
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JMPL, 5-22, 5-26, 6-20, 6-22 
branching, 15-11 
couples, 6-22 
JMPL/RETT pairs, 6-22 
JTAG, 22-17 
busmaster, 21-4 
backplane-level, 21-22 
board-level, 21-22 
CAPTURE operation, 21-6 
controller 
second-level, 21-22 
third-level, 21-22 
emulation, 3-5 
instructions, multicache controller, 21-14 
interface, 13-2, 13-7, 21-3, 23-2 
IR, 21-4, 22-2 
MOMD scan register, 15-7 
MCMD.INITM bits, 15-8 
MDIN, 22-12 
MDOUT, 22-12 
mechanism, 21-2 
memory references, 21-11 
operations, 21-5 
register domains, 21-7 
reset requirements, 21-4 
scan chain element, 21-6 
scan chain register elements, 21-10 
serial scan interface, 21-1 
serial scan interface mechanism, 21-3 
SHIFT operation, 21-6 
status, 21-2 
TAP, 22-6, 22-8 
TAP controller, 21-4, 21-8, 21-11, 23-2 
tap controller, 22-14 
TAP controller reset, 22-7, 22-9, 22-10, 22-11 
TDR Emulation Registers, 22-4 
TDR scan chain, 22-4 
ТОР scan operation, 22-4 
UPDATE operation, 21-6 
accessible serial scan chains, 21-7 
initiated BIST, 13-7 


jumps, 5-19 


large-windowed register file, 2-2 
last emulation instruction, 22-18 
last pipe stage, 12-8 
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latching, 5-4 
latency, 6-10 
LOMDS, 20-6 
LDA instructions, 22-12 
LDATA, 20-6 
LDD instruction, 5-24 
LDF/STF registers instruction, 5-16 
LDFSR instruction, 4-12, 4-14 
LDFSR/STFSR register instructions, 5-16 
LDST (load and store), VBus transactions and waveforms, 18-43 
LDSTUB instruction, 8-4, 15-3, 17-34, 17-42 
LDSTUBA instruction, B-1 
least significant bit, 4-2 
legal and illegal emulation instructions, 22-16 
level, 9-31 
level 2 
signals, 17-2 
TDI, common, 21-22 
PTP2 cache, 9-10 
TDO, common, 21-22 
linear mapping, 9-6 
linked list traversal, 6-15 
load, 5-14 
load alternate, 8-6 
load and store alternates, 8-6 
load and store instructions, 3-7, 2-5, 5-16 
load operation, 5-8, 12-15 
local flush 
operation, 8-9 
transactions, 8-9 
lock bit field, 8-4, 17-42, 10-10, 10-24 
lock indicator, 17-11 
logic, intemal, 23-2 
logic analyzer, 15-15 
LONG_BIST, 13-7, 21-7 
low byte, 20-6 
low order address bit, 5-5, 12-15 
low-speed peripherals, 20-1 
LDA/STA, 8-6 


MACK, 22-8 
manufacturing fault coverage, 21-2 
mapping, linear, 9-6 
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master, 17-2 
maximum interrupt latency, 12-12 
MBSEL pin, 13-12, 17-41, 20-1 
MBus, 17-1 

arbiter, 17-2 

busy, 17-8 

Glock frequency, 17-4 

command word signals, 17-9 

configuration, 16-9 

configurations, Viking, 17-4 

error, 17-7 

grant, 17-8 

level 2, 16-4 

master clock, 17-6 

mode, 9-24 

mode, 5-6 

module, 24-1 

mechanical information, 24-9 

module identifier, 17-10 

module schematics, 24-3 

operation, 17-13 

overview, 17-2 

ready, 17-7 

reference clock, 17-13 

request, 17-8 

retry, 17-7 

signals, 17-6 

standard, level 2, 3-4 

system, full module, 24-2 

timing summary, 17-48 

transactions, 10-28 
MBus/XBus interface, 16-10 
MXCC, 13-9, 16-1, 23-4, 23-6 

basic functionality, 16-8 

BIST register, 16-23 

block copy, 16-30 

block diagram, 16-8 

built-in self-test register, 16-23 

consistency, 17-18 

control register, 16-24 

Error Handling, 16-36 

error register, 16-42 

internal registers, 16-21 

interrupts, 16-33 

MBus port register, 16-29 

pins, D-2 

reset register, 13-10, 13-12, 16-28 

SRAM access, 18-49 

status register, 16-26 

synchronous and asynchronous operation, 23-7 
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system reset, 13-10 
МСССВ.РЕ, 13-9, 14-25 
MCI, 21-7, 21-12, 22-17 
register, 22-9 
scan chain, 21-9, 21-13 
TDR, 22-19 
MCI.MINST register, 22-17 
MCMD 
mode, compound, 22-18 
register, 22-19 
MCMD.INTM bit, 15-5 
MCMD.MENTER, 22-8, 22-9, 22-14, 22-19 
MONTL, 9-22, 13-9 
MONTL.AC, 14-25 
MCNTL.AC bit, 8-6, 10-37, 13-4 
MOCNTL.BT bit, 12-9, 13-4, 13-6 
MCNTL.MB bit, 10-21 
MCNTL.NF bit, 9-20, 10-37 
MONTL.PE, 13-9, 14-25 
MONTL.PE bit, 13-9 
MONTL.PSO bit, 8-3 
MONTL.SB, 8-2 
MCNTL.SB bit, 8-3, 10-42 
MCNTL.SE, 10-27 
MCNTLSE bit, 10-13 
MONTL.TC, 14-25 
MCTP, 9-24 
MDIAG.BKC register, 12-12 
MDIN, 21-7, 21-12, 22-12, 22-19, 22-23, 22-25 
register, 22-12, 22-23 
scan chain, 21-9, 21-13 
MDOUT, 21-7, 21-12, 21-13, 22-7, 22-12, 22-25 
port, 22-13 
register, 22-13, 22-20, 22-21, 22-22, 22-23 
scan chain, 21-9, 21-13 
mechanical information, MBus module, 24-9 
memories, separate 1&0, 2-7 
memory, 2-4, 5-2 
addressing, conventions, 2-5 
alignment restrictions, 2-5 
access, instructions, SPARC architecture, 2-5 
address, 2-4 
address and data, 17-6 
address not aligned trap, 12-15 
address register ports, 6-19 
address strobe, 17-7 
and VO transactions, VBus, 18-7 
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controllers, 17-2 
inhibit, 17-8 
model, 2-2, 8-1 
SPARC architecture, 2-9 
support, 8-11 
read, 22-22 
reference, 5-4, 4-16, 5-8, 6-14, 8-7 
State, 21-2 
write, 22-23 
memory-mapped emulation counter status register, 15-12 
memory-reference patterns, 6-2 
MENTER register, 22-17 
message priority detection, 19-48 
MEXEC, 22-6, 22-7, 22-17, 22-18 
MEXIT bit, 22-7, 22-10, 22-17, 22-18 
MFAR, 9-29, 9-34, 9-35, 22-10 
MFSR, 9-26, 9-28, 9-29, 9-34, 14-22, 22-10 
error bits, 10-37 
fault status register, 9-20 
register description, 9-30 
timing and operation, 9-28 
MFSR.CS bit, 9-27 
MFSR.CS status bit, 9-27 
MFSR.EM bit, 13-5, 13-6 
MFSR.FT bit, 9-29 
MFSR.FT field, 9-26, 9-28 
MFSR.OW, 9-29 
MFSR.SB bit, 9-27 
MFSR.SB error bit, 9-28 
MFSR.UC bit, 9-30 
MID, 17-10 
middle byte, 20-6 
MIDONE bit, 22-10, 22-11 
MIFLTD, 22-10 
MIFLTD status bit, 22-10 
MINST instruction, 22-16, 22-19 
misaligned destination register, 4-2 
miss penalties, TLB, 3-4 
MIX, 15-12 
MMU, 3-4, 5-5, 9-1-9-33 
address, translation modes, 9-19 
breakpoint control registers, 15-7 
consistency, VBus, 18-9 
control register, 9-22, 10-6, 14-25 
control registers, 9-22 
enable, 9-24 
fault status register, 9-26 
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flush operation, 8-8 
hit criteria, 9-11 
modified bits, 9-8 
operation, 5-10, 9-1-9-33 
probe, 9-12 
protection errors, 12-13, 12-15 
R&M updates, 10-35 
referenced bits, 9-8 
registers, 2-6, 9-22 
shadow FSR register, 9-35 
table walk, 10-12 
TLB, 9-36 
transparent mode, 9-18 
MMU.SB, 10-38 
models of memory, 8-2 
modified bit, 9-4, 9-8, 9-38 
module asynchronous error detect, 17-9 
module 
ID, 17-15 
identifier, 17-9 
reset input, 17-9 
MOESI protocols, 17-2 
MPC/MNPC, 22-12 
MRESET, 22-10 
MSB, 4-5 
MSFSR register, 9-35, 22-10 
MSTAT, 21-7, 21-12, 21-13, 22-2, 22-8, 22-18, 22-19 
register, 21-13, 22-8 
scan chain, 21-9, 22-8 
update operation, 22-10 
MSTAT.ERRMODE, 22-10 
MSTAT.MACK, 22-18 
MSTAT.MIDONE, 22-18 
MSTAT.MIFLTD, 22-18 
MSTAT.MIFLTD bit, 22-16 
MSTAT.PFPX, 22-25 
MSTAT,MIFLTD, 22-16 
MTMP[1~2], 22-12 
МТМРИ-2] registers, 22-12 
MTMP'1 register, 22-21 
MULScc, 4-8 
multicache controller, 13-12, 14-23, 21-14 
multiple-instructions-per-cycle execution, 3-7 
multiply/divide register, register summary, 4-8 
multiprocessor 
synchronization instructions, 2-2 
muttiprocessor system, 17-2 
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NaN, 4-14, 11-3 
NEW DATA VALUE, 22-23 
next program counter (nPC), 2-11, 4-9, 5-26, 22-6, 22-7 
NF bit, 9-20, 12-9 
no default 

grantee, 19-47 

bit, 9-24, 10-37, 12-9 

operation, 9-20 
non-cacheable 

accesses, 16-10 

stores, 10-35 
non-emulation watchdog reset, 22-17 
non-maskabie 

interrupts, 12-11 

requests, 12-11 
non-standard mode, 4-13, 11-6 
non-cacheable 

loads, 8-7 

stores, 8-7 
non-writable registers, 21-6 
NOPs, 22-6 
NS, 4-13 
numeric cases, iv, 11-1 
NWINDOWS, 2-3 


odd destination register, 4-2 
odd-even pair of registers, 2-3 
on-chip PLL, 21-13 
one stall, 6-10 
operands, 2-2, 5-5 
operation, 5-1, 6-1 

introduction, 5-2 
operation code, 16-43, 16-44 
oscilloscope, 15-15 
overall operation (processor), 2-3 
overflow 

bit, 4-4 

flag, 12-15 

trap mask, 4-13 
overview of pipeline example, 5-7 
overwrite bit, 9-29, 9-34 
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PA 
bit, 16-43 
field, 16-43, 16-44 
packet detection, 19-16 
packet-switched 
bus, 19-4 
interface, 3-4 
packets, 19-15 
page table 
entry, 9-4, 9-31 
memory operations, 8-8 
pointer, 3-4, 9-4, 9-5 
hardware use of, 8-8 
PAMD, 15-9 
parallel daisy chains, 21-22 
parity, 12-13 
enable, 9-24 
error, 9-30 
partial store ordering, 3-5, 8-2, 9-24 
pass through/bypass transactions, 10-22 
PC, 4-7, 4-9, 5-2, 5-25, 22-6, 22-7 
pair 
altered non-emulation, 22-18 
captured non-emulation, 22-18 
value, 5-5, 5-23 
PCLK, 16-4, 17-42, 23-7 
РОШ and BCLK relationship, 23-7 
РЕМО , 8-11 
pending bit, 13-10 
performance, 6-2 
performance analysis, 3-5, 15-2 
periphery testing, 21-2 
PFPX, 22-11, 22-25 
phase locked loop, 23-1, 23-2 
physical address, 9-3, 16-43, 16-44 
bits, 10-32 
cache, 3-4 
physical page number, 9-3, 9-4, 9-37 
physical signal summary (MBus signals), 17-6 
physical value, 22-25 
PIL, 4-5, 15-13 
pin timings, 23-6 
pins, MXCC, D-2 
pipeline, 3-4, 5-5 
bubble, 5-6 


Subject to Change Without Notice 


Index -27 


bubbles, 5-4 
example overview, 5-7 
tundamentals, 5-4 
hold cycle, 5-26 
write-back, 5-6 
pipelined execution, 5-2 
PLL, 21-13, 23-1, 23-2 
bypassed, 23-6 
clock, 21-13, 23-3, 23-4 
Clock feedback loop, 23-2 
enabled, 23-6 
instability, 23-2 
restabilization, 21-11 
port busy state, 17-29 
ports 
integer register write, 6-4 
to memory, 6-4 
to the FPU, 6-4 
possible emulation instruction sequences, 22-19 
POST, 14-8 
internal storage devices, 14-8 
data cache, 14-12 
instruction cache, 14-11 
other POSTs, 14-13 
windowed register file, 14-8 
support routines, 14-13 
data cache, 14-17 
instruction cache, 14-14 
power-on reset, 10-6, 10-16, 14-6, 15-12 
power-on self-test, 14-8 
PPN, 9-3, 9-4, 9-37 
prefetch, 8-10 
buffers, 13-7 
exception handling, 8-10 
exceptions, 10-12 
previous emulation instruction, 22-18 
primary 
execution stage, 5-5 
register, 21-6, 21-10, 21-13 
register's scan chain, CID, 21-10 
scan register, 21-10 
primitives, 22-19 
principles of operation, 5-1, 6-1 
prior non-emulation FPOPs, 22-11 
privilege error code, 9-32 
privileged 
instructions, 2-3, 12-13 
load/store alternate instructions, 2-6 
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probe address, 9-12 
procedure call and return, 5-23 
processor 
arbitration logic, 16-10 
bus interface, 16-10 
command logic, 16-9, 16-10 
pipeline, 5-4 
State register, 2-3, 4-8, 4-4, 22-19 
status register, 6-21 
processor-dependent values, 2-6 
processor-state registers, 2-6 
program counter (PC), 2-3, 2-11, 4-7, 4-9, 5-2, 5-5, 12-7, 21-2 
program execution 
block-stepping, 22-28 
profile, 22-28 
single-stepping, 22-28 
programmable timers, 15-2 
programs, self-modifying, 2-7 
protected registers, 2-6 
protocol commands, compound emulation, 22-17 
PS, 4-5 
pseudo-random test vectors, 21-11 
PSO, 3-5, 8-2, 8-3, 14-25 
PSR, 2-4, 4-4, 5-24, 6-21, 14-21, 22-21 
PSR.EC bit, 12-14 
PSR.EF, 12-14 
PSR.ET, 5-25, 12-10 
PSR.ET bit, 10-36, 12-11, 12-14 
PSR.IPL, 22-9 
PSR.PIL, 5-25, 12-11 
PSR.PS bit, 12-9 
PSR.S, 12-13 
Ptag format, 10-15, 10-30 
PTE, 9-4, 9-14, 9-16, 9-23, 9-31, 9-32, 9-33, 10-22 
PTP, 3-4, 9-4, 9-14, 9-31, 9-32 
ptp, 9-5 


quad precision and extended precision, FPU operation, 11-4 
quad-precision value, 2-3 
queue, 4-14 


R&M update, 9-8 
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R&R 
acknowledgement, 17-25, 17-27, 17-28, 17-29 
reply, 17-34, 17-40, 17-42, 17-47 
ratio of PCLK to BCLK, 23-7 
RD, 4-12 
RDASR 
RDPSR, 4-4, 4-5, 6-21 
RDTBR instruction, 4-7 
RDWIM instruction, 4-6 
RDY, 4-8 
read operands, 5-5 
read valid command, 20-6 
read/write control register, 2-5 
reads 
double-word, 22-22 
half-word, 22-22 
signed, 22-22 
unsigned, 22-22 
word, 22-22 
READY response, 17-40 
receiver delay, 23-2 
recovery mechanism, 21-2 
reduction of branches, 6-7 
reference clock, MBus, 17-13 
reference/miss count register, 16-23 
referenced bits, 9-8 
register, 5-2, 9-22 
file, 13-7 
addresses, 2-2 
dual-port, 16-11 
floating-point, 2-2 
large-windowed, 2-2 
ports, 5-5, 6-19 
write port, 5-23 
global, integer unit, 2-3 
in, 2-3 
local, integer unit, 2-3 
out, 2-3 
pairs, 2-3 
state, 21-2 
summary, 4-1 
ancillary state register, 4-10 
floating-point f registers, 4-11 
floating-point queue, 4-16 
floating-point state registers, 4-12 
integer unit r registers, 4-2 
multiply/divide register, 4-8 
processor state register, 4-4 
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program counters, 4-9 
trap base register, 4-7 
window invalid mask, 4-6 
windows, 2-11, 4-2 
relinquish & retry reply, 17-34 
remote 
debugging environment, 21-2 
emulation, 21-9, 22-2, 22-18, 22-23, 
reply queue, 16-11 
request queue, 16-10, 16-11 
reset, 10-36, 14-2, 14-25 
modes, 13-1 
requirements, JTAG, 21-4 
states, 14-21 
caches, 14-22 
contro! registers, 14-21 
end reset, 14-27 
floating-point unit, 14-27 
MFSR, 14-22 
MMU conirol register, 14-25 
multicache controller, 14-23 
PSR, 14-21 
superscalar execution, 14-21 
TLB, 14-22 
trap, 12-13, 13-6 
types, 13-2, 13-4 
resource allocation, 5-5, 6-4 
RESTORE, 4-4, 4-5, 4-6, 5-23, 12-14, 22-21 
retry acknowledgement, 17-27 
RETT, 4-5, 4-6, 5-26, 6-22 
instruction, 4-4, 4-6, 12-9, 12-14 
pipeline, 5-26 
retum from trap pipeline, 5-26 
ring size, 21-6 
RISC, 3-4, 5-8 
root cache, 9-10 
root pointer cache, 3-4 
rounding operations, FPU operation, 11-4 
routing delay, internal, 23-2 
Run-for-N instructional Cycles, 22-28 


S bit, 4-5, 16-43, 16-44 

SAVE instruction, 4-4, 4-5, 4-6, 5-23, 12-14 
SBONTL.Dptr, 10-39 

SBTAGS.SP, 7-6 
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scan chain, 21-6, 21-10 
CID primary register's, 21-10 
JTAG, 21-6 
registers, primary, 21-6 
scan rings, independent, 21-12 
SDIV instruction, 4-8 
SDIVcc instruction, 4-8 
second emulation session, 22-18 
second operand, 12-16 
second-level 
JTAG controller, 21-22 
TAP controller, 21-22 
SEE PLL, 21-13 
select, 9-36 
self test, 13-7 
self-aligning nature of execution, 6-7 
separate 180 memories, 2-7 
sequence errors, 12-15 
serial scan chains, JTAG-accessible, 21-7 
SETHI, SPARC architecture, instructions, 2-7 
setting code and data address breakpoints, 22-25 
shadow FSR, 22-10 
shared 
bit, 10-32 
signal, 17-14 
write, VBus transactions and waveforms, 18-32 
SHIFT, 2-7, 21-5, 21-6, 21-8 
shift register, 21-6 
shifter, 5-6 
SHORT, BIST, 13-7, 21-7 
51 reset, 13-10 
signal user emulation request (SIGM), 4-10, 7-7, 22-14 
signature 
register, 21-7, 21-11 
scan chain, 21-11 
value, 13-7 
SIGNATURE TDR, 21-11 
signed and unsigned haif-word and word and double-word reads, 22-22 
signed 
byte, 22-22 
reads, 22-22 
single registers, 2-3 
single- and double-precision floating-point arithmetic functions, high-performance, 3-4 
single-chip interface, 3-4 
single-point memory references, 6-14 
single-precision 
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operand, 4-11 
value, 2-3 
single-word wide, 22-25 
SIZE, 17-44 
slave, 17-2 
SMUL, 4-8 
SMULcc, 4-8 
snoop 
enable, 9-23 
hit, 5-6, 10-10, 10-24, 10-27 
on the bus, 5-6 
snoop-enable bit, 10-13 
snooping 
bus, 3-5 
caches, 17-25 
software 
debugging facilities, 15-2 
intemal reset, 13-10 
MXCC, 13-11 
memory-scrubbing scheme, hardware-assisted, 16-30 
software-controlied boundary scan, 21-2 
source 
address, 16-30 
register, 6-13 
space atomics (alternate), 8-5 
SPARC 
architecture 
coprocessor, 2-4 
floating-point unit, 2-3 
input/output, 2-10 
instructions, 2-5 
arithmetic, 2-7 
control transter, 2-7 
coprocessor operate, 2-8 
floating-point operate, 2-8 
load/store, 2-5 
logical, 2-7 
memory access, 2-5 
SETHI, 2-7 
shift, 2-7 
state register access, 2-B 
integer unit, 2-3 
introduction, 2-2 
manual, 2-1 
memory model, 2-9 
partial store ordering (PSO), 2-9 
total store ordering (TSO), 2-9 
SPARC reference MMU, 2-13 
traps, 2-11 
summary, 2-1 
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assembly language, 21-2 
implementation, 2-3 
instruction, 22-16 
instruction set, 22-16 
International, inc., 2-2 
processor, 2-2 
reference memory management unit (MMU), 2-13, 3-4 
control registers, 9-22 
operation, 9-23 
specification, iii, 9-1 
registers, 10-34 
split 
after 
any control transfer instruction, 6-17 
condition codes set in cascade rule, 6-17 
first instruction after anulled branch rule, 6-17 
first valid exception rule, 6-16 
MULSCC destination not equal to source of next MULSCC rule, 6-17 
before 
cascade into 
JMPL rule, 6-20 
memory reference address, 6-20 
shift rule, 6-20 
control regíster read after previous SetCC rule, 6-21 
delay group CTI unless first rule, 6-22 
extended arithmetic from CC set in current group, 6-22 
invalid instruction rule, 6-19 
load data cascade use rule, 6-20 
MULSCC unless first one or two instructions rule, 6-22 
out of integer register ports rule, 6-19 
previous group cascade ínto memory reference address rule, 6-20 
second cascade rule, 6-19 
second shift rule, 6-19 
sequential ínstruction rule, 6-21 
between add and shift, 5-9 
spread address calculation, 6-14 
square root, 5-14 
squashed instruction, 5-25-21 
SRAM chips, 16-4 
ST instruction, 3-4, 5-6, 6-10 
STA emulation instruction, 8-6, 10-35, 13-2, 13-7, 22-10 
stable clock sources, 23-6 
Stag format, 10-15, 10-30 
stages, 5-4 
stale data, 17-14 
startup procedure, 14-1 
POST, 14-8 
power-on self-test, 14-8 
reset handling, 14-2 
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reset states, 14-21 
caches, 14-22 
control registers, 14-21 
end reset, 14-27 
floating-point unit, 14-27 
MFSR, 14-22 
MMU control register, 14-25 
muiticache controller, 14-23 
PSR, 14-21 
superscalar execution, 14-21 
TLB, 14-22 
state 
during emulation mode, 22-15 
register access 
ancillary state, instructions, SPARC architecture, 2-8 
instructions, SPARC architecture, 2-8 
vectors, signature, 21-9 
STBAR instruction, 4-10, 7-6, 10-41 
STDFQ instruction, 4-14, 4-16, 5-16, 22-25 
STEN, CEK, 15-13 
STEN_DBK, 15-13 
STEN, ZCC, 15-13 
STEN ZIC, 15-13 
STFSR instruction, 4-12, 4-14 
store, 22-21 
address, 5-10 
alternate, 8-6, 10-35 
barrier (STBAR), 7-6, 10-41 
buffer, 8-7, 9-8, 10-26, 10-34, 13-5, 13-7 
data store exceptions, 10-36 
diagnostics and control, 10-40 
disabled operation—strong ordering, 10-39 
general operation, 10-34 
high integration, Viking introduction, 3-4 
non-buffered operations, 10-35 
synchronous operations, 10-35 
control, 10-38 
control register, 10-42 
copy-out, 5-12, 8-4 
data, 10-38 
data register, 10-41 
depth, 12-11 
disabled, 10-42 
empty bit, 10-42 
enable, 9-24 
buffer enabled, 10-42 
error, 9-30 
error pending bit, 10-42 
errors, 9-27 
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non-empty, 10-42 
snooping, 5-12 
tag, 10-38 
tag register, 10-37, 14-25 
tags register, 10-40 
FSR operations, 12-15, 22-25 
instructions, 3-4 
stream 
data 
buffer, 16-30 
register, 16-31 
destination register, 16-32 
source register, 16-30, 16-31 
stream write operations, 16-32 


STSFR instruction, 4-14 
style and symbol conventions, vii 
sub-block 

address, 16-30 

states, 19-21 
Sun Microsystems, Inc., 2-2 
superscalar 

design, 3-3 

execution, 5-2, 13-5, 14-21 
SuperSPARC 

configurations, 1-4, 17-2 

introduction, 1-1, 1-2 

processor, 21-7 
supervisor 

access indicator, 17-10 

bit, 10-41 

data space, 22-22 

mode, 2-3 

software, trap, 2-3 

state, 16-43, 16-44, 17-10 
supervisor-only instructions, 2-3 
supervisor-only page, 9-32 
SWAP, 15-3 

access, 13-8 

instruction, 8-4, 17-34, 17-42 

transactions, 9-8 
SWAPA instruction, B-1 
symbolic constants, 22-20 
synchronous 

external stores, 12-12 

operation, 23-7 

SRAM, 16-4 
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system 
clock skew, 23-1 
configurations, SuperSPARC, 1-4 
noise, 23-3, 23-4 
reset, 13-10 
Software use of page tables, 8-8 
system implementation-dependent error, 17-31, 17-32, 19-19, 19-20 
system-dependent values, 2-6 
system-level test, 21-22 


table walk cacheable bit, 9-14, 9-23, 10-12 
ТАООССТУ instruction, 12-15 


commands, 19-32 
comparison, 5-5 
tagged 
add, 12-15 
data instructions, 2-2 
operation, 4-4 
operation overtlow trap, 12-15 
subtract, 12-15 
taken branch, 5-19, 5-20, 5-21, 5-23, 6-2, 6-7, 6-17 
TAP 
controller, 21-4, 21-6, 23-2 
JTAG, 21-4, 21-11, 23-2 
second-level, 21-22 
reset, 22-8 
state machine, 21-3 
reset state, 13-2, 21-11 
ТВА, 4-7 
TBR, 4-7, 22-21 
TC, 9-23 
TCK cycle, 13-2, 21-3, 21-4, 21-6, 21-10, 21-11, 21-22, 22-8, 23-2 
TDI, 21-3, 21-10, 21-13, 21-22 
data, 21-10 
value, 21-6 
TDO, 21-3, 21-6, 21-10, 21-13, 21-22 
TDR scan chain, 21-8 
TEM bit, 4-12, 4-15 
TEM value, 4-12 
TEM.DZM, 4-13 
TEM.NVM, 4-13 
TEM.NXM, 4-13 
TEM.OFM, 4-13 
TEM.UFM, 4-13 
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test 
access point controller, 21-4 
Clock, 21-3, 22-8, 23-2 


mode select, 21-3 

vectors, pseudo-random, 21-11 
TEST LOGIC RESET state, 21-3, 21-4 

JTAG, 21-4 
testability, high integration, Viking introduction, 3-6 
third-level JTAG controller, 21-22 
three models of memory, 8-2 
three-stated, 20-6 
throughput, 6-10 
TICC instructions, 4-4, 4-7, 12-16 
time-out, 9-30 

error, 16-13 

response, 16-13, 17-39, 17-47 
timing summary, MBus, 17-48 
TLB, 5-5, 9-3, 9-18, 9-36, 10-12, 13-5, 13-7, 14-22 

entry, 9-9, 9-36 

miss penalties, 3-4 

replacement policy, 9-9 
TMRM, 22-8, 22-9 
TMS, 13-2, 21-3, 21-4, 21-11, 21-22 pin, 23-2 
total store ordering, 3-5, 8-2, 9-24 
transactions, 19-16 

and wavetorms, VBus, 18-17 
transfer, 16-30 
transient requests, 12-11 
translation look aside buffer, 9-36 
transparent mode, 9-18 
trap, 2-5, 12-1 

and the store buffer, 12-10 

base 

address, 4-7 
register (ТВА), 2-11, 4-7 

deferred, 2-11 

categories, 2-11 

details, 12-13 

exception mask, 4-12 

handler, 2-11 

handling, 2-2 

instructions, 12-16 

interrupting, 2-11 

precise, 2-11 

priorities, 12-8 
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type, 2-11 
SPARC architecture, 2-11 
to supervisor software, 2-3 
type, 12-4 
field, 4-7, 12-9 
not implemented, 12-6 
trapped floating-point instruction, 2-4 
tri-state, 20-6 
TSO, 3-5, 8-2 
TSUBCCTV instruction, 12-15 
TT field, 4-7, 12-9 
two-byte boundaries, 2-5 
type field, 9-12 


u, 4-12 
UD/UC/TO/BE errors, 9-30 
UDIV instruction, 4-8 
UDIVcc instruction, 4-8 
UMLL, 4-8 
UMULcc, 4-8 
unassigned opcode, 12-14 
uncorrectable 
error, 9-30 
response, 17-39, 17-47 
undefined error, 9-30 
underfiow 
detection, FPU operations, 11-4 
trap mask, 4-13 
unfinished FPop exceptions, FPU operation, 11-3 
unimplemented instruction trap, 12-14 
uniprocessor system, 17-2 
unordered relation, 4-14 
unsigned reads, 22-22 
untaken branch, 5-19, 5-21, 6-17 
UPDATE operation, 21-5, 21-6, 21-8, 21-10, 21-11, 21-13 
user mode, 2-3 
SPARC architecture, 2-3 
user-application program, 2-3 
user-application programs, 2-3 


valid bit, 10-41, 13-10 
VBus, 18-1 
arbitration, 18-8 
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bus keeper, 18-10, 18-18 
bus transaction, 18-6 
cache consistency, 18-6 
error reporting, 18-6 
memory and I/O transactions, 18-7 
MMU consistency, 18-9 
signals, 18-10 
SRAM and register reads, 18-46 
transactions and waveforms, 18-17 
burst read hit, 18-22 
burst read miss, 18-24 
cache disable read, 18-27 
cache-disable write or non-cacheable write, 18-35 
cacheable single read hit, 18-19 
cacheable single read miss, 18-20 
cacheable write hit, 18-28 
invalidate, 18-37 
LDST (load and store), 18-43 
shared write, 18-32 
VBus SRAM and register reads, 18-46 
VCK, 22-8 
VCO behavior, 21-13 
version, 4-13 
version number, 4-4, 9-22 
VDPA, 9-12 5: 
virtual address, 5-5, 9-3, 17-10 
virtual 
flush, 9-12 
page number, 9-3 
value, 22-25 
VPN, 9-3 


WAIT/RUN_BIST state, 21-11 м ж: 
warnings regarding BIST operation, 13-9 | 
watchdog 
reset, , 9-33, 12-9, 12-13, 12-14, 13-2, 13-6, 13-10, 14-4, 15-12, 17-9 
MXCC, 13-11 
non-emulation, 22-17 
logic, 17-31 , 2 
sequence, 22-10 | М 
trap, 12-9 "dii 
waveforms, VBus, 18-17 iei 9 tab webs 
and transactions, VBus, 18-17 ets 
WB, 5-6, 5-10, 5-14, 12-8 a 4 
WD reset, 13-10 а Lg u 
WIM register, 4-6, 12-14, 22-21 ш 
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window 
address, 4-3 
invalid mask, register summary, 2-2, 4-6 
overflow, 4-6, 12-14 
window underflow trap, 12-14 
window. overflow trap, 4-6 
window. underfiow trap, 4-6 
windowed register file, 14-8 
word, 2-5 


reads, 22-22 
word reference, 5-8, 12-15 
wrapping, 17-15 
write 
hits in stream transfers, 19-40 
invalidate protocol, 17-14 
memory, 22-22 
valid command, 20-6 
write-allocate, 5-6 
write-back policy, 17-13 
write-invalidate cache-consistency protocol, 17-25 
write-through cache, 5-6 
write-through mode, 10-21 
WRPSR instruction, 4-4, 4-5, 4-7, 12-14 
WRTBR instruction, 4-7 
WRWIM instruction, 4-6 
WRY, 4-8 


XADDR, 22-22, 22-23 
XBus, 3-4, 19-1 
arbiter, 16-11 
arbitration priorities, 19-44 
bus commands, 19-23 
data commands, 19-23 
tag commands, 19-32 па ин 
bus cycle waveforms, 19-49 АЕ. 
bus ownership, 19-46 
default grantee, 19-47 
no default grantee, 19-47 
bus protocol, 19-15 
cycles, 19-15 
packet detection, 19-16 
packets, 19-15 
transactions, 19-16 
cache consistency protocols, 19-21 
sub-block states, 19-21 
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